Quadratic assignment problem

Author: Thomas Kueny, Eric Miller, Natasha Rice, Joseph Szczerba, David Wittmann (SysEn 5800 Fall 2020)

  • 1 Introduction
  • 2.1 Koopmans-Beckman Mathematical Formulation
  • 2.2.1 Parameters
  • 2.3.1 Optimization Problem
  • 2.4 Computational Complexity
  • 2.5 Algorithmic Discussions
  • 2.6 Branch and Bound Procedures
  • 2.7 Linearizations
  • 3.1 QAP with 3 Facilities
  • 4.1 Inter-plant Transportation Problem
  • 4.2 The Backboard Wiring Problem
  • 4.3 Hospital Layout
  • 4.4 Exam Scheduling System
  • 5 Conclusion
  • 6 References

Introduction

The Quadratic Assignment Problem (QAP), discovered by Koopmans and Beckmann in 1957 [1] , is a mathematical optimization module created to describe the location of invisible economic activities. An NP-Complete problem, this model can be applied to many other optimization problems outside of the field of economics. It has been used to optimize backboards, inter-plant transportation, hospital transportation, exam scheduling, along with many other applications not described within this page.

Theory, Methodology, and/or Algorithmic Discussions

Koopmans-beckman mathematical formulation.

Economists Koopmans and Beckman began their investigation of the QAP to ascertain the optimal method of locating important economic resources in a given area. The Koopmans-Beckman formulation of the QAP aims to achieve the objective of assigning facilities to locations in order to minimize the overall cost. Below is the Koopmans-Beckman formulation of the QAP as described by neos-guide.org.

Quadratic Assignment Problem Formulation

{\displaystyle F=(F_{ij})}

Inner Product

{\displaystyle A,B}

Note: The true objective cost function only requires summing entries above the diagonal in the matrix comprised of elements

{\displaystyle F_{i,j}(X_{\phi }DX_{\phi }^{T})_{i,j}}

Since this matrix is symmetric with zeroes on the diagonal, dividing by 2 removes the double count of each element to give the correct cost value. See the Numerical Example section for an example of this note.

Optimization Problem

With all of this information, the QAP can be summarized as:

{\displaystyle \min _{X\in P}\langle F,XDX^{T}\rangle }

Computational Complexity

QAP belongs to the classification of problems known as NP-complete, thus being a computationally complex problem. QAP’s NP-completeness was proven by Sahni and Gonzalez in 1976, who states that of all combinatorial optimization problems, QAP is the “hardest of the hard”. [2]

Algorithmic Discussions

While an algorithm that can solve QAP in polynomial time is unlikely to exist, there are three primary methods for acquiring the optimal solution to a QAP problem:

  • Dynamic Program
  • Cutting Plane

Branch and Bound Procedures

The third method has been proven to be the most effective in solving QAP, although when n > 15, QAP begins to become virtually unsolvable.

The Branch and Bound method was first proposed by Ailsa Land and Alison Doig in 1960 and is the most commonly used tool for solving NP-hard optimization problems.

A branch-and-bound algorithm consists of a systematic enumeration of candidate solutions by means of state space search: the set of candidate solutions is thought of as forming a rooted tree with the full set at the root. The algorithm explores branches of this tree, which represent subsets of the solution set. Before one lists all of the candidate solutions of a branch, the branch is checked against upper and lower estimated bounds on the optimal solution, and the branch is eliminated if it cannot produce a better solution than the best one found so far by the algorithm.

Linearizations

The first attempts to solve the QAP eliminated the quadratic term in the objective function of

{\displaystyle min\sum _{i=1}^{n}\sum _{j=1}^{n}c{_{\phi (i)\phi (j)}}+\sum _{i=1}^{n}b{_{\phi (i)}}}

in order to transform the problem into a (mixed) 0-1 linear program. The objective function is usually linearized by introducing new variables and new linear (and binary) constraints. Then existing methods for (mixed) linear integer programming (MILP) can be applied. The very large number of new variables and constraints, however, usually poses an obstacle for efficiently solving the resulting linear integer programs. MILP formulations provide LP relaxations of the problem which can be used to compute lower bounds.

Numerical Example

Qap with 3 facilities.

{\displaystyle D={\begin{bmatrix}0&5&6\\5&0&3.6\\6&3.6&0\end{bmatrix}}}

Applications

Inter-plant transportation problem.

The QAP was first introduced by Koopmans and Beckmann to address how economic decisions could be made to optimize the transportation costs of goods between both manufacturing plants and locations. [1] Factoring in the location of each of the manufacturing plants as well as the volume of goods between locations to maximize revenue is what distinguishes this from other linear programming assignment problems like the Knapsack Problem.

The Backboard Wiring Problem

As the QAP is focused on minimizing the cost of traveling from one location to another, it is an ideal approach to determining the placement of components in many modern electronics. Leon Steinberg proposed a QAP solution to optimize the layout of elements on a blackboard by minimizing the total amount of wiring required. [4]

When defining the problem Steinberg states that we have a set of n elements

{\displaystyle E=\left\{E_{1},E_{2},...,E_{n}\right\}}

as well as a set of r points

{\displaystyle P_{1},P_{2},...,P_{r}}

In his paper he derives the below formula:

{\displaystyle min\sum _{1\leq i\leq j\leq n}^{}C_{ij}(d_{s(i)s(j))})}

In his paper Steinberg a backboard with a 9 by 4 array, allowing for 36 potential positions for the 34 components that needed to be placed on the backboard. For the calculation, he selected a random initial placement of s1 and chose a random family of 25 unconnected sets.

The initial placement of components is shown below:

quadratic assignment procedure

After the initial placement of elements, it took an additional 35 iterations to get us to our final optimized backboard layout. Leading to a total of 59 iterations and a final wire length of 4,969.440.

quadratic assignment procedure

Hospital Layout

Building new hospitals was a common event in 1977 when Alealid N Elshafei wrote his paper on "Hospital Layouts as a Quadratic Assignment Problem". [5] With the high initial cost to construct the hospital and to staff it, it is important to ensure that it is operating as efficiently as possible. Elshafei's paper was commissioned to create an optimization formula to locate clinics within a building in such a way that minimizes the total distance that a patient travels within the hospital throughout the year. When doing a study of a major hospital in Cairo he determined that the Outpatient ward was acting as a bottleneck in the hospital and focused his efforts on optimizing the 17 departments there.

Elshafei identified the following QAP to determine where clinics should be placed:

{\displaystyle min\sum _{i,j}\sum _{k,q}f_{ik}d_{jq}y_{ij}y_{kq}}

For the Cairo hospital with 17 clinics, and one receiving and recording room bringing us to a total of 18 facilities. By running the above optimization Elshafei was able to get the total distance per year down to 11,281,887 from a distance of 13,973,298 based on the original hospital layout.

Exam Scheduling System

The scheduling system uses matrices for Exams, Time Slots, and Rooms with the goal of reducing the rate of schedule conflicts. To accomplish this goal, the “examination with the highest cross faculty student is been prioritized in the schedule after which the examination with the highest number of cross-program is considered and finally with the highest number of repeating student, at each stage group with the highest number of student are prioritized.” [6]

{\displaystyle n!}

  • ↑ 1.0 1.1 1.2 Koopmans, T., & Beckmann, M. (1957). Assignment Problems and the Location of Economic Activities. Econometrica, 25(1), 53-76. doi:10.2307/1907742
  • ↑ 2.0 2.1 Quadratic Assignment Problem. (2020). Retrieved December 14, 2020, from https://neos-guide.org/content/quadratic-assignment-problem
  • ↑ 3.0 3.1 3.2 Burkard, R. E., Çela, E., Pardalos, P. M., & Pitsoulis, L. S. (2013). The Quadratic Assignment Problem. https://www.opt.math.tugraz.at/~cela/papers/qap_bericht.pdf .
  • ↑ 4.0 4.1 Leon Steinberg. The Backboard Wiring Problem: A Placement Algorithm. SIAM Review . 1961;3(1):37.
  • ↑ 5.0 5.1 Alwalid N. Elshafei. Hospital Layout as a Quadratic Assignment Problem. Operational Research Quarterly (1970-1977) . 1977;28(1):167. doi:10.2307/300878
  • ↑ 6.0 6.1 Muktar, D., & Ahmad, Z.M. (2014). Examination Scheduling System Based On Quadratic Assignment.

Navigation menu

  • Analysis of Algorithms
  • Backtracking
  • Dynamic Programming
  • Divide and Conquer
  • Geometric Algorithms
  • Mathematical Algorithms
  • Pattern Searching
  • Bitwise Algorithms
  • Branch & Bound
  • Randomized Algorithms
  • Introduction to Exact Cover Problem and Algorithm X
  • Transform and Conquer Technique
  • Preemptive Priority CPU Scheduling Algorithm
  • Introduction to Grover's Algorithm
  • Art Gallery Problem
  • Implementation of Exact Cover Problem and Algorithm X using DLX
  • Representation Change in Transform and Conquer Technique
  • How to develop an Algorithm from Scratch | Develop Algorithmic Thinking
  • Approximation Algorithms
  • Algorithm definition and meaning
  • What Does Big O(N^2) Complexity Mean?
  • How to write a Pseudo Code?
  • What are Objects in Programming?
  • Order of removal in Josephus problem in O(N logN)
  • Can ChatGPT be used to solve Competitive Coding Problems?
  • Difference between Data Structures and Algorithms
  • How Data Structures can be used to achieve Efficient Memory Utilization
  • Difference between Deterministic and Non-deterministic Algorithms
  • Merge Sort Tree with point update using Policy Based Data Structure

Quadratic Assignment Problem (QAP)

The Quadratic Assignment Problem (QAP) is an optimization problem that deals with assigning a set of facilities to a set of locations, considering the pairwise distances and flows between them.

The problem is to find the assignment that minimizes the total cost or distance, taking into account both the distances and the flows.

The distance matrix and flow matrix, as well as restrictions to ensure each facility is assigned to exactly one location and each location is assigned to exactly one facility, can be used to formulate the QAP as a quadratic objective function.

The QAP is a well-known example of an NP-hard problem , which means that for larger cases, computing the best solution might be difficult. As a result, many algorithms and heuristics have been created to quickly identify approximations of answers.

There are various types of algorithms for different problem structures, such as:

  • Precise algorithms
  • Approximation algorithms
  • Metaheuristics like genetic algorithms and simulated annealing
  • Specialized algorithms

Example: Given four facilities (F1, F2, F3, F4) and four locations (L1, L2, L3, L4). We have a cost matrix that represents the pairwise distances or costs between facilities. Additionally, we have a flow matrix that represents the interaction or flow between locations. Find the assignment that minimizes the total cost based on the interactions between facilities and locations. Each facility must be assigned to exactly one location, and each location can only accommodate one facility.

Facilities cost matrix:

Flow matrix:

To solve the QAP, various optimization techniques can be used, such as mathematical programming, heuristics, or metaheuristics. These techniques aim to explore the search space and find the optimal or near-optimal solution.

The solution to the QAP will provide an assignment of facilities to locations that minimizes the overall cost.

The solution generates all possible permutations of the assignment and calculates the total cost for each assignment. The optimal assignment is the one that results in the minimum total cost.

To calculate the total cost, we look at each pair of facilities in (i, j) and their respective locations (location1, location2). We then multiply the cost of assigning facility1 to facility2 (facilities[facility1][facility2]) with the flow from location1 to location2 (locations[location1][location2]). This process is done for all pairs of facilities in the assignment, and the costs are summed up.

Overall, the output tells us that assigning facilities to locations as F1->L1, F3->L2, F2->L3, and F4->L4 results in the minimum total cost of 44. This means that Facility 1 is assigned to Location 1, Facility 3 is assigned to Location 2, Facility 2 is assigned to Location 3, and Facility 4 is assigned to Location 4, yielding the lowest cost based on the given cost and flow matrices.This example demonstrates the process of finding the optimal assignment by considering the costs and flows associated with each facility and location. The objective is to find the assignment that minimizes the total cost, taking into account the interactions between facilities and locations.

Applications of the QAP include facility location, logistics, scheduling, and network architecture, all of which require effective resource allocation and arrangement.

Please Login to comment...

  • 10 Best Tools to Convert DOC to DOCX
  • How To Summarize PDF Documents Using Google Bard for Free
  • Best free Android apps for Meditation and Mindfulness
  • TikTok Is Paying Creators To Up Its Search Game
  • 30 OOPs Interview Questions and Answers (2024)

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Browse Econ Literature

  • Working papers
  • Software components
  • Book chapters
  • JEL classification

More features

  • Subscribe to new research

RePEc Biblio

Author registration.

  • Economics Virtual Seminar Calendar NEW!

IDEAS home

The Quadratic Assignment Procedure (QAP)

  • Author & abstract
  • 18 Citations
  • Related works & more

Corrections

(Harvard Business School)

Suggested Citation

Download full text from publisher.

Follow serials, authors, keywords & more

Public profiles for Economics researchers

Various research rankings in Economics

RePEc Genealogy

Who was a student of whom, using RePEc

Curated articles & papers on economics topics

Upload your paper to be listed on RePEc and IDEAS

New papers by email

Subscribe to new additions to RePEc

EconAcademics

Blog aggregator for economics research

Cases of plagiarism in Economics

About RePEc

Initiative for open bibliographies in Economics

News about RePEc

Questions about IDEAS and RePEc

RePEc volunteers

Participating archives

Publishers indexing in RePEc

Privacy statement

Found an error or omission?

Opportunities to help RePEc

Get papers listed

Have your research listed on RePEc

Open a RePEc archive

Have your institution's/publisher's output listed on RePEc

Get RePEc data

Use data assembled by RePEc

Quadratic Assignment Procedure (QAP) in statnet

Phil murphy, practicum setup.

This practicum is still in the early stages. So, you may note that it is a little on the terse at the moment. That will be rectified in future iterations. But, for now, we will run through a few options that you are likely to find helpful if you plan to use statistical analysis to confirm any suspicions that you developed during background research, or perhaps during your exploratory work.

This part is important. Please be sure to do this part again

Create a folder labeled “QAP_Practicum” someplace on your computer, such as your Desktop or wherever you will be able to easily find it again. Then, set your working directory to that folder in R Studio:

  • Set Working Directory
  • Choose Directory…

Well-intentioned tip for those just starting out in R Studio: You will notice that the History tab in R Studio now has a line that looks something like setwd("~/Desktop/Practicum 3") . If you highlight that and click “To Source”, that line will be included in your script. Any time you run the script in the future, you will not need to manually set your working directory every time you begin to go through an older script. There is, of course, a much better way to do this for the long term. But, this will save you a little time in the interim. Studio

Getting Started

For the sake of familiarity and expediency, we’ll, once again, stick with Padgett’s Florentine families data. The object of this practicum is to familiarize you with a variety of statistical tools that should be helpful to one or more of your projects. For that reason, it is likely that some of these analyses may seem a little pithy or odd. Please understand that the examples are being provided to demonstrate how you may use R to conduct these analyses. Bear with me, then use these with your own data for something a little more worthy of inference!

Data for this Practicum

Load Padgett data. These have been saved as data that are appropriate for statnet . To access them, go to the Network Data link on the course website and download the padbus.rda and padmar.rda files for business ties, and marriage ties, respectively.

These will load data objects named “Padgett_Business” and “Padgett_Marriage” in R’s memory.

You will also need to convert an attribute that we used in the past to a network. Although this practicum will be conducted almost entirely in statnet , igraph offers the most efficient means of converting a two-mode network to one-mode. So, we’ll use igraph to preprocess our data, and then convert it to a statnet object, using the intergraph package.

To begin, load the party.csv file. We have used this before to study brokerage in Practicum 6, so you can find it in the Practicum 6 folder on the Network Data page of the course website.

Now that your data are ready for use, move on to statnet .

Load statnet

All of the following analyses may be found in statnet . This package was developed on top of Carter Butts’ network and sna packages. Each of those packages are dependencies for statnet , so they load with it. Below, I may refer to various functions as residing in statnet , despite the fact that they are actually part of sna or network . The reference to statnet is therefore intended to be encompassing of all three packages. When in doubt, just use the help function ( ? ) to learn more about any individual command that we cover below.

Quadratic Assignment Procedure (QAP)

Quadratic assignment procedure (QAP) is similar to CUG, in that it simulation in order to generate a distribution of hypothetical networks. But QAP controls for network structure, as compared with CUG, which controls for size, the number of edges, or dyad census. (Note: CUG can be made to condition on other properties. But size, edges, and dyad census are the options available through statnet .)

QAP controls for network structure by repermuting the network over a large number of iterations - usually around 1000. For more details, check out chapter 9 of the book .

QAP is useful for running a variety of statistical functions. Below, are three of the options that you have available in statnet .

QAP Correlation

Did the Florentine families base their business dealings on the marriage ties? (Or maybe their marriages are based on their business ties?)

Marriage and business ties are moderately correlated (r=0.37).

The correlation is significant at the 0.05 alpha level. We know this because less than 5% the permuted networks - or in this case, all of them - exhibited correlation coefficients that were either, greater than, or less than that of the value we calculated for these networks.

For a visual of this comparison, see below.

QAP Linear Regression

We are finally ready to use the “Padgett_Party” network that you constructed from an attribute. If you have not already created this variable, go back to the top and do that now. We’ll wait.

Okay, now that you have the variables ready, you may use it to run a multiple regression.

Strictly speaking, what we are about to do is inappropriate. It is inappropriate because the ties within this network are binary, and multiple linear regression is meant for continuous measures. So, linear regression works well if your dependent network has valued ties (e.g., trade networks, layered networks, etc.)

Because we are using a binary network as our dependent variable, we can interpret the estimates as explaining the probability of a tie in the dependent network, controlling for other variables in the model.

In the output above, x1 represents the first variable in the model (marriage ties), and x2 represents the second variable that we entered (party ties).

The F-statistic tells us that the model is useful (p<0.05). When we consider the individual predictors, it is apparent from the two-tailed p-values ( Pr(>=|b|) ) that party ties do not explain business ties (p=0.97) and marriage ties do explain business ties (p<0.0001). We can, therefore, interpret the model estimates as telling us that the presence of a marriage tie increases the probability of a business tie by 0.33. Conversely, we interpret party ties as not explaining business ties, since party was not significant.

Again, this example would have been much better if we had valued ties so that we could predict or explain the value of a tie, given the various independent variables. For a much better way to analyze a binary network, see below.

QAP Logistic Regression

If you have a network with binary ties, this is the regression method that you should use to predict ties in the outcome network (dependent variable).

Below, we run the same model. However, this time, we are using the appropriate distribution. The process is the same, but the interpretation is somewhat more involved.

The logistic model looks fairly similar to the linear regression model. The estimates, however, are interpreted very differently.

As above, party ties do not explain business ties (p=0.98) and marriage ties do explain business ties (p<0.0001). In this output, the estimates for this test are given in two columns. The first column, labeled “Estimate”, gives the log odds. You can interpret this as a tendency towards forming ties (for positive values), or a tendency away from forming ties (for negative values). It is, however, much easier to read the second column, labeled “Exp(b)”. The values in the second are the log odds that have been converted to odds ratios by taking their exponent. To interpret the odds ratios, think of them as the likelihood of a tie forming, given the presence of some other factor; as compared with the absence of that factor.

Using the second column of estimates, the output indicates that a business tie is 8.83 times more likely to form in the presence of a marriage tie, than in the absence of a marriage tie. If you are interested in betting, then think of this as the odds of forming a business tie being 8.83:1 in the presence of a marriage tie. Alternatively, you could say that the formation of business ties is 8.83 times more likely when there is a marriage tie, then when there is not a marriage tie. Party is not significant (p=0.97), so we interpret party ties as not explaining business ties in this model.

Bonus: Visualizing Two Networks, Side-by_side, Using the Same Coordiantes

Sometimes, it is handy to compare networks. If both networks contain all the same nodes, then it can be very helpful to use the same coordinates in order to help display the differences in the structure. Doing this requires just a couple of visualizaiton tools.

Let’s start with the coordinates. In statnet , preserving coordinates is done by saving a visualization layout as an object. There really is little more to it than plotting the network as usual, and saving that into an object. The thing to keep in mind is that the layout will be different each time you plot. The starting point for the coordinates is randomly shuffled somewhat each time you run the gplot() function. So, it is a good idea to use the set.seed() function to make that a little more predictable.

When you have an orientation you like, preserve it. Don’t worry if the labels go outside of the plot area at this point. We will address that momentarily, when we visualize the networks side-by-side.

The coordiantes are saved as x and y values. To see them, just type in whatever name you used to store them.

You may also notice that the labels are running off of the plot a little. We can fix that, but we need to understand where the outer limits of the coordinates are.

To see this, you can read through the coordinates, or you can use R to check the minimum ( min() ) and maximum ( max() ) values in the coordinates.

We can see that the x coordinates range from -2.12 to 10.10, and the y coordinates range from -9.10 to 3.68. We will use those in a moment.

In the meantime, we will focus on displaying the networks side-by-side. The par() function tells the graphics window how to behave. Among other things, you can include the option mfrow=c(nrows, ncols) to create a matrix of visualizations. Using this opiton, the graphics window will be filled in by row.

Let’s use all of this to create our side-by-side plots.

Notice, we will use the coords object we created earler for both plots. You will also notice an additional argument: xlim() . The xlim() and ylim() arguments can be included in the plot to widen ( xlim ) the plot area, or increase the vertical dimensions ( ylim ).

This time, we are using only the xlim argument to essentially widen the plot area a little. As we saw above, the x axis for the plot ranges from -2.12 to 10.10. Below, we can set the limits just a little wider (-4 to 14) to give a little more space for the labels. We are also decreasing the label size ( label.cex ) a little. The default size for the labels is 1, so we are decreasing that about 25%.

Your own adjustments are likely to be different. So, consider the steps we took here when you plot your networks.

Intro to Network Regression (MRQAP)

Dai shizuka, updated 06/12/19.

Perhaps the most common class of statistical models ecologists employ is the linear model. Linear models can be easily applied to predict the effect of one individual attribute on another (e.g., effect of node trait on node position in a network). However, the task is much more difficult when we want to predict the effect of some relational variable on network relations–e.g., how the similarity between two nodes affects the probability of an edge, or how network relations predict the transmission of pathogens. This is because relational data are not independent, so we require methods that can account for this non-independence.

The Mantel Test (Mantel 1967) has long been a useful approach for this type of data. The Mantel Test is a nonparametric matrix correlation test that uses permutations (node-permutations in this case) to get around the problem of non-independence. This test is widely available in several different R packages, for example those focused on community ecology (e.g., ‘ecodist’).

The Quadratic Assignment Procedure (QAP) is an extension of the Mantel Test in a regression framework, developed in the field of psychometrics. The Multiple Regression Quadratic Assignment Procedure (MRQAP) is an extension of this approach to allow for multiple covariate matrices (Krackhardt 1988). Essentially, MRQAP allows you to determine the influence of one matrix on another, controlling for the effects of one or more covariate matrices. There are several forms of MRQAP, but the most popular method is called the Double Semipartialling (DSP) method, developed by Dekker et al. (2007). This method permutes the matrix of residuals from the ordinary least regression of the dependent matrix on the independent matrices, to estimate error and calculate the effects.

MRQAP can be implemented in the R packages ‘sna’ (which is bundled inside ‘statnet’) and ‘asnipe’. There are slight differences between the outputs for the two functions, but the results are nearly the same. Here is a simple code just to illustrate how to implement these:

Next: 8. Diffusion in Networks

Perform Quadratic Assignment Procedure (QAP) Hypothesis Tests for Graph-Level Statistics

Description.

qaptest tests an arbitrary graph-level statistic (computed on dat by FUN ) against a QAP null hypothesis, via Monte Carlo simulation of likelihood quantiles. Note that fair amount of flexibility is possible regarding QAP tests on functions of such statistics (see an equivalent discussion with respect to CUG null hypothesis tests in Anderson et al. (1999)). See below for more details.

The null hypothesis of the QAP test is that the observed graph-level statistic on graphs G_1,G_2,\dots was drawn from the distribution of said statistic evaluated (uniformly) on the set of all relabelings of G_1,G_2,\dots . Pragmatically, this test is performed by repeatedly (randomly) relabeling the input graphs, recalculating the test statistic, and then evaluating the fraction of draws greater than or equal to (and less than or equal to) the observed value. This accumulated fraction approximates the integral of the distribution of the test statistic over the set of unlabeled input graphs.

The qaptest procedure returns a qaptest object containing the estimated likelihood (distribution of the test statistic under the null hypothesis), the observed value of the test statistic on the input data, and the one-tailed p-values (estimated quantiles) associated with said observation. As usual, the (upper tail) null hypothesis is rejected for significance level alpha if p>=observation is less than alpha (or p<=observation, for the lower tail); if the hypothesis is undirected, then one rejects if either p<=observation or p>=observation is less then alpha/2. Standard caveats regarding the use of null hypothesis testing procedures are relevant here: in particular, bear in mind that a significant result does not necessarily imply that the likelihood ratio of the null model and the alternative hypothesis favors the latter.

In interpreting a QAP test, it is important to bear in mind the nature of the QAP null hypothesis. The QAP test should not be interpreted as evaluating underlying structural differences; indeed, QAP is more accurately understood as testing differences induced by a particular vertex labeling controlling for underlying structure. Where there is substantial automorphism in the underling structures, QAP will tend to given non-significant results. (In fact, it is impossible to obtain a one-tailed significance level in excess of \max_{g \in \{G,H\}} \frac{|Aut(g)|}{|Perm(g)|} when using a QAP test on a bivariate graph statistic f(G,H) , where Aut(g) and Perm(g) are the automorphism and permutation groups on g, respectively. This follows from the fact that all members of Aut(g) will induce the same values of f() .) By turns, significance under QAP does not necessarily imply that the observed structural relationship is unusual relative to what one would expect from typical structures with (for instance) the sizes and densities of the graphs in question. In contexts in which one's research question implies a particular labeling of vertices (e.g., "within this group of individuals, do friends also tend to give advice to one another"), QAP can be a very useful way of ruling out spurious structural influences (e.g., some respondents tend to indiscriminately nominate many people (without regard to whom), resulting in a structural similarity which has nothing to do with the identities of those involved). Where one's question does not imply a labeled relationship (e.g., is the shape of this group's friendship network similar to that of its advice network), the QAP null hypothesis is inappropriate.

An object of class qaptest , containing

Carter T. Butts [email protected]

Anderson, B.S.; Butts, C.T.; and Carley, K.M. (1999). “The Interaction of Size and Density with Graph-Level Indices.” Social Networks , 21(3), 239-267.

Hubert, L.J., and Arabie, P. (1989). “Combinatorial Data Analysis: Confirmatory Comparisons Between Sets of Matrices.” Applied Stochastic Models and Data Analysis , 5, 273-325.

Krackhardt, D. (1987). “QAP Partialling as a Test of Spuriousness.” Social Networks , 9 171-186.

Krackhardt, D. (1988). “Predicting With Networks: Nonparametric Multiple Regression Analyses of Dyadic Data.” Social Networks , 10, 359-382.

  • Introduction:  Applying statistical tools to network data
  • Univariate descriptive statistics
  • Hypotheses about one mean or density
  • Hypotheses about two paired means or densities
  • Correlation between two networks with the same actors
  • Network regression
  • Hypotheses about the means of two groups
  • Hypotheses about the means of multiple groups
  • Regressing position on attributes
  • Hypotheses about relations within/between groups
  • Homophily models
  • Hypotheses about similarity and distance
  • The probability of a dyadic tie:  Leinhardt's P1

Network analysis in the social sciences developed from a conjuncture of anthropologist's observations about relations in face-to-face groups and mathematical graph theory.  A very large part of social network methodology, consequently, deals with relatively small networks, networks where we have confidence in the reliability of our observations about the relations among the actors.  Most of the tools of social network analysis involve the use of mathematical functions to describe networks and their sub-structures.

In more recent work, however, some of the focus of social network research has moved away from these roots.  Increasingly, the social networks that are being studied may contain many nodes; and, sometimes our observations about these very large networks are based not on censuses, but on samples of nodes.  Network researchers have come to recognize that the relations that they study may be constantly evolving, and that the relations observed at one point in time may not the entirely typical because the pattern of relations is not "in equilibrium."  They have also recognized that sometimes our observations are fallible -- we fail to record a relation that actually exists, or mis-measure the strength of a tie.

All of these concerns (large networks, sampling, concern about the reliability of observations) have led social network researchers to begin to apply the techniques of descriptive and inferential statistics in their work.  Statistics provide useful tools for summarizing large amounts of information, and for treating observations as stochastic, rather than deterministic outcomes of social processes.

Descriptive statistics have proven to be of great value because they provide convenient tools to summarize key facts about the distributions of actors, attributes, and relations; statistical tools can describe not only the shape of one distribution, but also joint distributions, or "statistical association."  So, statistical tools have been particularly helpful in describing, predicting, and testing hypotheses about the relations between network properties.

Inferential statistics have also proven to have very useful applications to social network analysis.  At a most general level, the question of "inference" is:  how much confidence can I have that the pattern I see in the data I've collected is actually typical of some larger population, or that the apparent pattern is not really just a random occurrence?

In this chapter we will look at some of the ways in which quite basic statistical tools have been applied in social network analysis.  These are only the starting point.  The development of more powerful statistical tools especially tuned for the needs of social network analysis is one of the most rapidly developing "cutting edges" of the field.

Most social scientists have a reasonable working knowledge of basic univariate and bivariate descriptive and inferential statistics.  Many of these tools find immediate application in working with social network data.  There are, however, two quite important distinctive features of applying these tools to network data.

First, and most important, social network analysis is about relations among actors, not about relations between variables.  Most social scientists have learned their statistics with applications to the study of the distribution of the scores of actors (cases) on variables, and the relations between these distributions.  We learn about the mean of a set of scores on the variable "income."  We learn about the Pearson zero-order product moment correlation coefficient for indexing linear association between the distribution of actor's incomes and actor's educational attainment.

The application of statistics to social networks is also about describing distributions and relations among distributions.  But, rather than describing distributions of attributes of actors (or "variables"), we are concerned with describing the distributions of relations among actors.  In applying statistics to network data, we are concerned the issues like the average strength of the relations between actors; we are concerned with questions like "is the strength of ties between actors in a network correlated with the centrality of the actors in the network?"   Most of the descriptive statistical tools are the same for attribute analysis and for relational analysis -- but the subject matter is quite different!

Second, many of tools of standard inferential statistics that we learned from the study of the distributions of attributes do not apply directly to network data.  Most of the standard formulas for calculating estimated standard errors, computing test statistics, and assessing the probability of null hypotheses that we learned in basic statistics don't work with network data (and, if used, can give us "false positive" answers more often than "false negative").  This is because the "observations" or scores in network data are not "independent" samplings from populations.  In attribute analysis, it is often very reasonable to assume that Fred's income and Fred's education are a "trial" that is independent of Sue's income and Sue's education.  We can treat Fred and Sue as independent replications.

In network analysis, we focus on relations, not attributes.  So, one observation might well be Fred's tie with Sue; another observation might be Fred's tie with George; still another might be Sue's tie with George.  These are not "independent" replications.  Fred is involved in two observations (as are Sue an George), it is probably not reasonable to suppose that these relations are "independent" because they both involve George.

The standard formulas for computing standard errors and inferential tests on attributes generally assume independent observations.  Applying them when the observations are not independent can be very misleading.  Instead, alternative numerical approaches to estimating standard errors for network statistics are used.  These "boot-strapping" (and permutations) approaches calculate sampling distributions of statistics directly from the observed networks by using random assignment across hundreds or thousands of trials under the assumption that null hypotheses are true.

These general points will become clearer as we examine some real cases.  So, let's begin with the simplest univariate descriptive and inferential statistics, and then move on to somewhat more complicated problems.

For most of the examples in this chapter, we'll focus again on the Knoke data set that describes the two relations of the exchange of information and the exchange of money among ten organizations operating in the social welfare field.  Figure 18.1 lists these data.

Figure 18.1.  Listing ( Data>Display ) of Knoke information and money exchange matrices

These particular data happen to be asymmetric and binary.  Most of the statistical tools for working with network data can be applied to symmetric data, and data where the relations are valued (strength, cost, probability of a tie).  As with any descriptive statistics, the scale of measurement (binary or valued) does matter in making proper choices about interpretation and application of many statistical tools.

The data that are analyzed with statistical tools when we are working with network data are the observations about relations among actors.  So, in each matrix, we have 10 x 10 = 100 observations or cases.  For many analyses, the ties of actors with themselves (the main diagonal) are not meaningful, and are not used, so there would be  (N * N-1= 90) observations.  If data are symmetric (i.e. X ij = X ji ), half of these are redundant, and wouldn't be used, so there would be (N * N-1 / 2 = 45) observations.

What we would like to summarize with our descriptive statistics are some characteristics of the distribution of these scores.  Tools>Univariate Stats can be used to generate the most commonly used measures for each matrix (select matrix in the dialog, and chose whether or not to include the diagonal ).  Figure 18.2 shows the results for our example data, excluding the diagonal.

Figure 18.2.  Univariate descriptive statistics for Knoke information and money whole networks

For the information sharing relation, we see that we have 90 observations which range from a minimum score of zero to a maximum of one.  The sum of the ties is 49, and the average value of the ties is 49/90 = .544.  Since the relation has been coded as a "dummy" variable (zero for no relation, one for a relation) the mean is also the proportion of possible ties that are present (or the density), or the probability that any given tie between two random actors is present (54.4% chance).

Several measures of the variability of the distribution are also given.  The sums of squared deviations from the mean, variance, and standard deviation are computed -- but are more meaningful for valued than binary data.  The Euclidean norm (which is the square root of the sum of squared values) is also provided.  One measure not given, but sometimes helpful is the coefficient of variation (standard deviation / mean times 100) equals 91.5.  This suggests quite a lot of variation as a percentage of the average score.  No statistics on distributional shape (skew or kurtosis) are provided by UCINET.

A quick scan tells us that the mean (or density) for money exchange is lower, and has slightly less variability.

In addition to examining the entire distribution of ties, we might want to examine the distribution of ties for each actor.  Since the relation we're looking at is asymmetric or directed, we might further want to summarize each actor's sending (row) and receiving (column).  Figures 18.3 and 18.4 show the results of Tools>Univariate Stats for rows (tie sending) and columns (tie receiving) of the information relation matrix.

Figure 18.3.  Univariate descriptive statistics for Knoke information network rows

Figure 18.4.  Univariate descriptive statistics for Knoke information network columns

We see that actor 1 (COUN) has a mean (or density) of tie sending of .444.  That is, this actor sent four ties to the available nine other actors.  Actor 1 received somewhat more information than they sent, as their column mean is .556.  In scanning down the column (in figure 18.3) or row (in figure 18.4) of means, we note that there is quite a bit of variability across actors -- some send more and get more information than others.

With valued data, the means produced index the average strength of ties, rather than the probability of ties.  With valued data, measures of variability may be more informative than they are with binary data (since the variability of a binary variable is strictly a function of its mean).

The main point of this brief section is that when we use statistics to describe network data, we are describing properties of the distribution of relations, or ties among actors -- rather than properties of the distribution of attributes across actors.  The basic ideas of central tendency and dispersion of distributions apply to the distributions of relational ties in exactly the same way that they do to attribute variables -- but we are describing relations, not attributes.

Of the various properties of the distribution of a single variable (e.g. central tendency, dispersion, skewness), we are usually most interested in central tendency.

If we are working with the distribution of relations among actors in a network, and our measure of tie-strength is binary (zero/one), the mean or central tendency is also the proportion of all ties that are present, and is the "density."

If we are working with the distribution of relations among actors in a network, and our measure of tie-strength is valued, central tendency is usually indicated by the average strength of the tie across all the relations.

We may want to test hypotheses about the density or mean tie strength of a network.  In the analysis of variables, this is testing a hypothesis about a single-sample mean or proportion.  We might want to be confident that there actually are ties present (null hypothesis: network density is really zero, and any deviation that we observe is due to random variation).  We might want to test the hypothesis that the proportion of binary ties present differs from .50; we might want to test the hypothesis that the average strength of a valued tie differs from "3."

Network>Compare densities>Against theoretical parameter  performs a statistical test to compare the value of a density or average tie strength observed in a network against a test value.

Let's suppose that I think that all organizations have a tendency to want to directly distribute information to all others in their field as a way of legitimating themselves.  If this theory is correct, then the density of Knoke's information network should be 1.0.  We can see that this isn't true.  But, perhaps the difference between what we see (density = .544) and what the theory predicts (density = 1.000) is due to random variation (perhaps when we collected the information).

The dialog in figure 18.5 sets up the problem.

Figure 18.5.  Dialog of Compare densities>Against theoretical parameter

The "Expected density" is the value against which we want to test.  Here, we are asking the data to convince us that we can be confident in rejecting the idea that organizations send information to all others in their fields.

The parameter "Number of samples" is used for estimating the standard error for the test by the means of "bootstrapping" or computing estimated sampling variance of the mean by drawing 5000 random sub-samples from our network, and constructing a sampling distribution of density measures.  The sampling distribution of a statistic is the distribution of the values of that statistic on repeated sampling.  The standard deviation of the sampling distribution of a statistic (how much variation we would expect to see from sample to sample just by random chance) is called the standard error.  Figure 18.6 shows the results of the hypothesis test

Figure 18.6.  Test results

We see that our test value was 1.000, the observed value was .5444, so the difference between the null and observed values is -.4556.  How often would a difference this large happen by random sampling variation, if the null hypothesis (density = 1.000) was really true in the population?

Using the classical formula for the standard error of a mean (s / sqr(N)) we obtain a sampling variability estimate of .0528.  If we used this for our test, the test statistic would be -.4556//.0528 = 8.6 which would be highly significant as a t-test with N-1 degrees of freedom.

However, if we use the bootstrap method of constructing 5000 networks by sampling random sub-sets of nodes each time, and computing the density each time, the mean of this sampling distribution turns out to be .4893, and its standard deviation (or the standard error) turns out to be .1201.

Using this alternative standard error based on random draws from the observed sample, our test statistic is -3.7943.  This test is also significant (p = .0002).

Why do this?  The classical formula gives an estimate of the standard error (.0528) that is much smaller than than that created by the bootstrap method (.1201).  This is because the standard formula is based on the notion that all observations (i.e. all relations) are independent.  But, since the ties are really generated by the same 10 actors, this is not a reasonable assumption.  Using the actual data on the actual actors -- with the observed differences in actor means and variances, is a much more realistic approximation to the actual sampling variability that would occur if, say, we happened to miss Fred when we collected the data on Tuesday.

In general, the standard inferential formulas for computing expected sampling variability (i.e. standard errors) give unrealistically small values for network data.  Using them results in the worst kind of inferential error -- the false positive, or rejecting the null when we shouldn't.

The basic question of bivariate descriptive statistics applied to variables is whether scores on one attribute align (co-vary, correlate) with scores on another attribute, when compared across cases.  The basic question of bivariate analysis of network data is whether the pattern of ties for one relation among a set of actors aligns with the pattern of ties for another relation among the same actors.  That is, do the relations correlate?

Three of the most common tools for bivariate analysis of attributes can also be applied to the bivariate analysis of relations:

Does the central tendency of one relation differ significantly from the central tendency of another?  For example, if we had two networks that described the military and the economic ties among nations, which has the higher density?  Are military or are economic ties more prevalent?  This kind of question is analogous to the test for the difference between means in paired or repeated-measures attribute analysis.

Is there a correlation between the ties that are present in one network, and the ties that are present in another?  For example, are pairs of nations that have political alliances more likely to have high volumes of economic trade?  This kind of question is analogous to the correlation between the scores on two variables in attribute analysis.

If we know that a relation of one type exists between two actors, how much does this increase (or decrease) the likelihood that a relation of another type exists between them?  For example, what is the effect of a one dollar increase in the volume of trade between two nations on the volume of tourism flowing between the two nations?  This kind of question is analogous to the regression of one variable on another in attribute analysis.

In the section above on univariate statistics for networks, we noted that the density of the information exchange matrix for the Knoke bureaucracies appeared to be higher than the density of the monetary exchange matrix.  That is, the mean or density of one relation among a set of actors appears to be different from the mean or density of another relation among the same actors.

Network>Compare densities>Paired (same node)  compares the densities of two relations for the same actors, and calculates estimated standard errors to test differences by bootstrap methods.  When both relations are binary, this is a test for differences in the probability of a tie of one type and the probability of a tie of another type.  When both relations are valued, this is a test for a difference in the mean tie strengths of the two relations.

Let's perform this test on the information and money exchange relations in the Knoke data, as shown in Figure 18.7.

Figure 18.7.  Test for the difference of density in the Knoke information and money exchange relations

Results for both the standard approach and the bootstrap approach (this time, we ran 10,000 sub-samples) are reported in the output.  The difference between means (or proportions, or densities) is .3000.  The standard error of the difference by the classical method is .0697; the standard error by bootstrap estimate is .1237.  The conventional approach greatly underestimates the true sampling variability, and gives a result that is too optimistic in rejecting the null hypothesis that the two densities are the same.

By the bootstrap method, we can see that there is a two-tailed probability of .0178.  If we had a prior alternative hypothesis about the direction of the difference, we could use the one-tailed p level of .0052.  So, we can conclude with great confidence that the density of information ties among organizations is greater than the density of monetary ties.  That is, the observed difference would arise very rarely by chance in random samples drawn from these networks.

If there is a tie between two particular actors in one relation, is there likely to be a tie between them in another relation?  If two actors have a strong tie of one type, are they also likely to have a strong tie of another?

When we have information about multiple relations among the same sets of actors, it is often of considerable interest whether the probability (or strength) of a tie of one type is related to the probability (or strength) of another.  Consider the Knoke information and money ties.  If organizations exchange information, this may create a sense of trust, making monetary exchange relations more likely; or, if they exchange money, this may facilitate more open communications.  That is, we might hypothesize that the matrix of information relations would be positively correlated with the matrix of monetary relations - pairs that engage in one type of exchange are more likely to engage in the other.   Alternatively, it might be that the relations are complementary:  money flows in one direction, information in the other (a negative correlation).  Or, it may be that the two relations have nothing to do with one another (no correlation).

Tools>Testing Hypotheses>Dyadic (QAP)>QAP Correlation  calculates measures of nominal, ordinal, and interval association between the relations in two matrices, and uses quadratic assignment procedures to develop standard errors to test for the significance of association.  Figure 18.8 shows the results for the correlation between the Knoke information and monetary exchange networks.

Figure 18.8.  Association between Knoke information and Knoke monetary networks by QAP correlation

The first column shows the values of five alternative measures of association.  The Pearson correlation is a standard measure when both matrices have valued relations measured at the interval level.  Gamma would be a reasonable choice if one or both relations were measured on an ordinal scale.  Simple matching and the Jaccard coefficient are reasonable measures when both relations are binary; the Hamming distance is a measure of dissimilarity or distance between the scores in one matrix and the scores in the other (it is the number of values that differ, element-wise, from one matrix to the other).

The third column (Avg) shows the average value of the measure of association across a large number of trials in which the rows and columns of the two matrices have been randomly permuted.  That is, what would the correlation (or other measure) be, on the average, if we matched random actors?  The idea of the "Quadratic Assignment Procedure" is to identify the value of the measure of association when their really isn't any systematic connection between the two relations.  This value, as you can see, is not necessarily zero -- because different measures of association will have limited ranges of values based on the distributions of scores in the two matrices.  We note, for example, that there is an observed simple matching of .456 (i.e. if there is a 1 in a cell in matrix one, there is a 45.6% chance that there will be a 1 in the corresponding cell of matrix two).  This would seem to indicate association.  But, because of the density of the two matrices, matching randomly re-arranged matrices will display an average matching of .475.  So the observed measure differs hardly at all from a random result.

To test the hypothesis that there is association, we look at the proportion of random trials that would generate a coefficient as large as (or as small as, depending on the measure) the statistic actually observed.  These figures are reported (from the random permutation trials) in the columns labeled "P(large)" and "P(small)."  The appropriate one of these values to test the null hypothesis of no association is shown in the column "Signif."

Rather than correlating one relation with another, we may wish to predict one relation knowing the other.  That is, rather than symmetric association between the relations, we may wish to examine asymmetric association.  The standard tool for this question is linear regression, and the approach may be extended to using more than one independent variable.

Suppose, for example, that we wanted to see if we could predict which of the Knoke bureaucracies sent information to which others.  We can treat the information exchange network as our "dependent" network (with N = 90).

We might hypothesize that the presence of a money tie from one organization to another would increase the likelihood of an information tie (of course, from the previous section, we know this isn't empirically supported!).  Furthermore, we might hypothesize that institutionally similar organizations would be more likely to exchange information.  So, we have created another 10 by 10 matrix, coding each element to be a "1" if both organizations in the dyad are governmental bodies, or both are non-governmental bodies, and "0" if they are of mixed types.

We can now perform a standard multiple regression analysis by regressing each element in the information network on its corresponding elements in the monetary network and the government institution network.  To estimate standard errors for R-squared and for the regression coefficients, we can use quadratic assignment.  We will run many trials with the rows and columns in the dependent matrix randomly shuffled, and recover the R-square and regression coefficients from these runs.  These are then used to assemble empirical sampling distributions to estimate standard errors under the hypothesis of no association.

Version 6.81 of UCINET offers four alternative methods for Tools>Testing Hypotheses>Dyadic (QAP)>QAP Regression.   Figure 18.9 shows the results of the " full partialling " method.

Figure 18.9.  QAP regression of information ties on money ties and governmental status by full partialling method

The descriptive statistics and measure of goodness of fit are standard multiple regression results -- except, of course, that we are looking at predicting relations between actors, not the attributes of actors.

The model R-square (.018) indicates that knowing whether one organization sends money to another, and whether the two organizations are institutionally similar reduces uncertainty in predicting an information tie by only about 2%.  The significance level (by the QAP method) is .120.  Usually, we would conclude that we cannot be sure the observed result is non-random.

Since the dependent matrix in this example is binary, the regression equation is interpretable as a linear probability model (one might want to consider logit or probit models -- but UCINET does not provide these).  The intercept indicates that, if two organizations are not of the same institutional type, and one does not send money to the other, the probability that one sends information to the other is .61.  If one organization does send money to the other, this reduces the probability of an information link by .046.  If the two organizations are of the same institutional type, the probability of information sending is reduced by .124.

Using the QAP method, however, none of these effects are different from zero at conventional (e.g. p < .05) levels.  The results are interesting - they suggest that monetary and informational linkages are, if anything, alternative rather than re-enforcing ties, and that institutionally similar organizations are less likely to communicate.  But, we shouldn't take these apparent patterns seriously, because they could appear quite frequently simply by random permutation of the cases.

The tools in the this section are very useful for examining how multi-plex relations among a set of actors "go together."  These tools can often be helpful additions to some of the tools for working with multi-plex data that we examined in chapter 16.

In the previous section we examined methods for testing differences and association among whole networks.  That is, studying the macro-patterns of how an actor's position in one network might be associated with their position in another.

We are often interested in micro questions, as well.  For example: does an actor's gender affect their between-ness centrality?  This question relates an attribute (gender) to a measure of the actor's position in a network (between-ness centrality).  We might be interested in the relationship between two (or more) aspects of actor's positions.  For example:  how much of the variation in actor's between-ness centrality can be explained by their out-degree and the number of cliques that they belong to?  We might even be interested in the relationship between two individual attributes among a set of actors who are connected in a network.  For example, in a school classroom, is there an association between actor's gender and their academic achievement?

In all of these cases we are focusing on variables that describe individual nodes.  These variables may be either non-relational attributes (like gender), or variables that describe some aspect of an individual's relational position (like between-ness).  In most cases, standard statistical tools for the analysis of variables can be applied to describe differences and associations.

But, standard statistical tools for the analysis of variables cannot be applied to inferential questions --  hypothesis or significance tests, because the individuals we are examining are not independent observations drawn at random from some large population.  Instead of applying the normal formulas (i.e. those built into statistical software packages and discussed in most basic statistics texts), we need to use other methods to get more correct estimates of the reliability and stability of estimates (i.e. standard errors).  The "boot-strapping" approach (estimating the variation of estimates of the parameter of interest from large numbers of random sub-samples of actors) can be applied in some cases; in other cases, the idea of random permutation can be applied to generate correct standard errors.

Suppose we had the notion that private-for-profit organizations were less likely to actively engage in sharing information with others in their field than were government organizations.  We would like to test this hypothesis by comparing the average out-degree of governmental and non-governmental actors in one organizational field.

Using the Knoke information exchange network, we've run Network>Centrality>Degree , and saved the results in the output file "FreemanDegree" as a UCINET dataset.  We've also used Data>Spreadsheets>Matrix to create a UCINET attribute file "knokegovt" that has a single column dummy code (1 = governmental organization, 0 = non-governmental organization).

Let's perform a simple two-sample t-test to determine if the mean degree centrality of government organizations is lower than the mean degree centrality of non-government organizations.  Figure 18.10 shows the dialog for Tools>Testing Hypotheses>Node-level>T-Test  to set up this test.

Figure 18.10.  Dialog for Tools>Testing Hypotheses>Node-level>T-Test 

Since we are working with individual nodes as observations, the data are located in a column (or, sometimes, a row) of one or more files.  Note how the file names (selected by browsing, or typed) and the columns within the file are entered in the dialog.  The normed Freeman degree centrality measure happens to be located in the second column of its file; there is only one vector (column) in the file that we created to code government/non-government organizations.

For this test, we have selected the default of 10,000 trials to create the permutation-based sampling distribution of the difference between the two means.  For each of these trials, the scores on normed Freeman degree centralization  are randomly permuted (that is, randomly assigned to government or non-government, proportional to the number of each type.)  The standard deviation of this distribution based on random trials becomes the estimated standard error for our test.  Figure 18.11 shows the results.

Figure 18.11.  Test for difference in mean normed degree centrality of Knoke government and non-government organizations

The output first reports basic descriptive statistics for each group.  The group numbers are assigned according to the order of the cases in the file containing the independent variable.  In our example, the first node was COUN, a government organization; so, government became "Group 1" and non-government became "Group 2."

We see that the average normed degree centrality of government organizations (75) is 6.481 units higher than the average normed degree centrality of non-governmental organizations (68.519).  This would seem to support our hypothesis; but tests of statistical significance urge considerable caution.  Differences as large as 6.481 in favor of government organizations happen 33.4% of the time in random trials -- so we would be taking an unacceptable risk of being wrong if we concluded that the data were consistent with our research hypothesis.

UCINET does not print the estimated standard error, or the values of the conventional two-group t-test.

The approach to estimating difference between the means of two groups discussed in the previous section can be extended to multiple groups with one-way analysis of variance (ANOVA).  The procedure Tools>Testing Hypotheses>Node-level>Anova provides the regular OLS approach to estimating differences in group means.  Because our observations are not independent, the procedure of estimating standard errors by random replications is also applied.

Suppose we divided the 23 large donors to California political campaigns into three groups, and have coded a single column vector in a UCINET attribute file.  We've coded each donor as falling into one of three groups:  "others," "capitalists," or "workers."

If we examine the network of connections among donors (defined by co-participating in the same campaigns), we anticipate that the worker's groups will display higher eigenvector centrality than donors in the other groups.  That is, we anticipate that the "left" interest groups will display considerable interconnection, and -- on the average -- have members that are more connected to highly connected others than is true for the capitalist and other groups.  We've calculated eigenvector centrality using Network>Centrality>Eigenvector , and stored the results in another UCINET attribute file.

The dialog for Tools>Testing Hypotheses>Node-level>Anova looks very much like Tools>Testing Hypotheses>Node-level>T-test , so we won't display it.  The results of our analysis are shown as figure 18.12.

Figure 18.12.  One-way ANOVA of eigenvector centrality of California political donors, with permutation-based standard errors and tests

The mean eigenvector centrality of the eight "other" donors is .125.  For the seven "capitalists" it is .106, and for the seven "workers" groups it is .323 (calculated elsewhere).  The differences among these means is highly significant (F = 34.4 with 2 d.f. and p = .0002).  The differences in group means account for 78% of the total variance in eigenvector centrality scores among the donors.

Where the attribute of actors that we are interested in explaining or predicting is measured at the interval level, and one or more of our predictors are also at the interval level, multiple linear regression is a common approach.  Tools>Testing Hypotheses>Node-level>Regression will compute basic linear multiple regression statistics by OLS, and estimate standard errors and significance using the random permutations method for constructing sampling distributions of R-squared and slope coefficients.

Let's continue the example in the previous section.  Our dependent attribute, as before, is the eigenvector centrality of the individual political donors.  This time, we will use three independent vectors, which we have constructed using Data>Spreadsheets>Matrix , as shown in figure 18.13.

Figure 18.13.  Construction of independent vectors for multiple linear regression

Two dummy variables have been constructed to indicate whether each donor is a member of the "capitalist" or the "worker" group.  The omitted category ("other") will serve as the intercept/reference category.  POSCOAL is the mean number of times that each donor participates on the same side of issues with other donors (a negative score indicates opposition to other donors).

Substantively, we are trying to find out whether the "workers" higher eigenvector centrality (observed in the section above) is simply a function of higher rates of participation in coalitions, or whether the workers have better connected allies -- independent of high participation.

Figure 18.14 shows the dialog to specify the dependent and the multiple independent vectors.

Figure 18.14.  Dialog for Tools>Testing Hypotheses>Node-level>Regression for California donor's eigenvector centrality

Note that all of the independent variables need to be entered into a single data set (with multiple columns).  All of the basic regression statistics can be saved as output, for use in graphics or further analysis.  Figure 18.15 shows the result of the multiple regression estimation.

Figure 18.15.  Multiple regression of eigenvector centrality with permutation based significance tests

The correlation matrix shows a very high collinearity between being in the workers group (variable 3) and participation in coalitions (variable 4).  This suggests that it may be difficult to separate effects of simple participation from those of being a workers interest group.

The R-squared is very high for this simple model (.987), and highly significant using permutation tests ( p = .014).

Controlling for total coalition participation, capitalist interests are likely to have slightly lower eigenvector centrality than others (-.0078), but this is not significant (p = .555).  Workers groups do appear to have higher eigenvector centrality, even controlling for total coalition participation (.075), but this tendency may be a random result (a one-tailed significance is only p = .102).  The higher the rate of participation in coalitions (POSCOAL), the greater the eigenvector centrality of actors (.0615, p = .021), regardless of which type of interest is being represented.

As before, the coefficients are generated by standard OLS linear modeling techniques, and are based on comparing scores on independent and dependent attributes of individual actors.  What differs here is the recognition that the actors are not independent, so that estimation of standard errors by simulation, rather than by standard formula, is necessary.

The t-test, ANOVA, and regression approaches discussed in this section are all calculated at the micro, or individual actor level.  The measures that are analyzed as independent and dependent may be either relational or non-relational.  That is, we could be interested in predicting and testing hypotheses about actors non-relational attributes (e.g. their income) using a mix of relational (e.g. centrality) and non-relational (e.g. gender) attributes.  We could be interested in predicting a relational attribute of actors (e.g. centrality) using a mix of relational and non-relational independent variables.

The examples illustrate how relational and non-relational attributes of actors can be analyzed using common statistical techniques.  The key thing to remember, though, is that the observations are not independent (since all the actors are members of the same network).  Because of this, direct estimation of the sampling distributions and resulting inferential statistics is needed -- standard, basic statistical software will not give correct answers.

In the previous section we looked at some tools for hypotheses about individual actors embedded in networks.  Models like these are very useful for examining the relationships among relational and non-relational attributes of individuals.

One of the most distinctive ways in which statistical analysis has been applied to social network data is to focus on predicting the relations of actors, rather than their attributes.  Rather than building a statistical model to predict each actor's out-degree, we could, instead, predict whether there was a tie from each actor to each other actor.  Rather than explaining the variance in individual persons, we could focus on explaining variation in the relations.

In this final section, we will look at several statistical models that seek to predict the presence or absence (or strength) of a tie between two actors.  Models like this are focusing directly on a very sociological question:  what factors affect the likelihood that two individuals will have a relationship?

One obvious, but very important, predictor of whether two actors are likely to be connected is their similarity or closeness.  In many sociological theories, two actors who share some attribute are predicted to be more likely to form social ties than two actors who do not.  This "homophily" hypothesis is at the core of many theories of differentiation, solidarity, and conflict.  Two actors who are closer to one in a network are often hypothesized to be more likely to form ties; two actors who share attributes are likely to be at closer distances to one another in networks.

Several of the models below explore homophily and closeness to predict whether actors have ties, or are close to one another.  The last model that we will look at the "P1" model also seeks to explain relations.  The P1 model tries to predict whether there exists no relation, an asymmetrical relation, or a reciprocated tie between pairs of actors.  Rather than using attributes or closeness as predictors, however, the P1 model focuses on basic network properties of each actor and the network as a whole (in-degree, out-degree, global reciprocity).  This type of model -- a probability model for the presence/absence of each possible relation in a graph as a function of network structures -- is one of the major continuing areas of development in social network methods.

One of the most commonplace sociological observations is that "birds of a feather flock together."  The notion that similarity (or homophily) increases the probability of the formation of social ties is central to most sociological theories.  The homophily hypothesis can be read to be making a prediction about social networks.  It suggests that if two actors are similar in some way, it is more likely that there will be network ties between them.  If we look at a social network that contains two types of actors, the density of ties ought to be greater within each group than between groups.

Tools>Testing Hypotheses>Mixed Dyadic/Nodal>Categorical Attributes>Joint-Count  Provides a test that the density of ties within and between two groups differs from what we would expect if ties were distributed at random across all pairs of nodes.

The procedure takes a binary graph and a partition (that is, a vector that classifies each node as being in one group or the other), and permutes and blocks the data.  If there was no association between sharing the same attribute (i.e. being in the same block) and the likelihood of a tie between two actors, we can predict the number of ties that ought to be present in each of the four blocks of the graph (that is:  group 1 by group 1; group 1 by group 2; group 2 by group 1; and group 2 by group 2).  These four "expected frequencies" can then be compared to the four "observed frequencies."  The logic is exactly the same as the Pearson Chi-square test of independence -- we can generate a "test statistic" that shows how far the 2 by 2 table departs from "independence" or "no association."

To test the inferential significance of departures from randomness, however, we cannot rely on standard statistical tables.  Instead, a large number of random graphs with the same overall density and the same sized partitions are calculated.  The sampling distribution of differences between observed and expected for random graphs can then be calculated, and used to assess the likelihood that our observed graph could be a result of a random trial from a population where there was no association between group membership and the likelihood of a relation.

To illustrate, if two large political donors contributed on the same side of political campaigns (across 48 initiative campaigns), we code them "1" as having a tie or relation, otherwise, we code them zero.  We've divided our large political donors in California initiative campaigns into two groups -- those that are affiliated with "workers" (e.g. unions, the Democratic party), and those that are not.

We would anticipate that two groups that represent workers interests would be more likely to share the tie of being in coalitions to support initiatives than would two groups drawn at random.  Figure 18.16 shows the results of Tools>Testing Hypotheses>Mixed Dyadic/Nodal>Categorical Attributes>Joint-Count  applied to this problem.

Figure 18.16.  Test for two-group differences in tie density

The partition vector (group identification variable) was originally coded as zero for non-worker donors and one for worker donors.  These have been re-labeled in the output as one and two.  We've used the default of 10,000 random graphs to generate the sampling distribution for group differences.

The first row, labeled "1-1" tells us that, under the null hypothesis that ties are randomly distributed across all actors (i.e. group makes no difference), we would expect 30.356 ties to be present in the non-worker to non-worker block.  We actually observe 18 ties in this block, 12 fewer than would be expected.  A negative difference this large occurred only 2.8% of the time in graphs where the ties were randomly distributed.  It is clear that we have a deviation from randomness within the "non-worker" block.  But the difference does not support homophily -- it suggest just the opposite; ties between actors who share the attribute of not representing workers are less likely than random, rather than more likely.

The second row, labeled "1-2" shows no significant difference between the number of ties observed between worker and non-worker groups and what would happen by chance under the null hypothesis of no effect of shared group membership on tie density.

The third row, labeled "2-2"    A difference this large indicates that the observed count of ties among interest groups representing workers (21) is much greater than expected by chance (5.3).ould almost never be observed if the null hypothesis of no group effect on the probability of ties were true.

Perhaps our result does not support homophily theory because the group "non-worker" is not really as social group at all -- just a residual collection of diverse interests.  Using Tools>Testing Hypotheses>Mixed Dyadic/Nodal>Categorical Attributes>Relational Contingency-Table  Analysis   we can expand the number of groups to provide a better test.  This time, let's categorize the political donors as representing "others," "capitalists," or "workers."  The results of this test are shown as figure 18.17.

Figure 18.17.  Test for three-group differences in tie density

The "other" group has been re-labeled "1," the "capitalist" group re-labeled "2," and the "worker" group re-labeled "3."  There are 87 total ties in the graph, with the observed frequencies shown ("Cross-classified Frequencies).

We can see that the the observed frequencies differ from the "Expected Values Under Model of Independence."  The magnitudes of the over and under-representation are shown as "Observed/Expected."  We note that all three diagonal cells (that is, ties within groups) now display homophily -- greater than random density.

A Pearson chi-square statistic is calculated (74.217).  And, we are shown the average tie counts in each cell that occurred in the 10,000 random trials.  Finally, we observe that p < .0002.  That is, the deviation of ties from randomness is so great that it would happen only very rarely if the no-association model was true.

The result in the section above seems to support homophily (which we can see by looking at where the deviations from independence occur.  The statistical test, though, is just a global test of difference from random distribution.  The routine Tools>Testing Hypotheses>Mixed Dyadic/Nodal>Categorical Attributes>ANOVA Density Models   provides specific tests of some quite specific homophily models.

The least-specific notion of how members of groups relate to members of other groups is simply that the groups differ.  Members of one group may prefer to have ties only within their group; members of another group might prefer to have ties only outside of their group.

The Structural Blockmodel option of Tools>Testing Hypotheses>Mixed Dyadic/Nodal>Categorical Attributes>ANOVA Density Models provides a test that the patterns of within and between group ties differ across groups -- but does not specify in what way they may differ.  Figure 18.18 shows the results of fitting this model to the data on strong coalition ties (sharing 4 or more campaigns) among "other," "capitalist," and "worker" interest groups.

Figure 18.18.  Structural blockmodel of differences in group tie density

The observed density table is shown first.  Members of the "other" group have a low probability of being tied to one another (.143) or to "capitalists" (.143), but somewhat stronger ties to "workers" (.250).  Only the "workers" (category 2, row 3) show strong tendencies toward within-group ties.

Next, a regression model is fit to the data.  The presence or absence of a tie between each pair of actors is regressed on a set of dummy variables that represent each of cells of the 3-by-3 table of blocks.  In this regression, the last block (i.e. 3-3) is used as the reference category.  In our example, the differences among blocks explain 27.6% of the variance in the pair-wise presence or absence of ties.  The probability of a tie between two actors, both of whom are in the "workers" block (block 3) is 1.000.  The probability in the block describing ties between "other" and "other" actors (block 1-1) is .857 less than this.

The statistical significance of this model cannot be properly assessed using standard formulas for independent observations.  Instead, 5000 trials with random permutations of the presence and absence of ties between pairs of actors have been run, and estimated standard errors calculated from the resulting simulated sampling distribution.

A much more restricted notion of group differences is named the Constant Homophily model in Tools>Testing Hypotheses>Mixed Dyadic/Nodal>Categorical Attributes>ANOVA Density Models.  This model proposes that all groups may have a preference for within-group ties, but that the strength of the preference is the same within all groups.  The results of fitting this model to the data is shown in figure 18.19.

Figure 18.19.  Constant Homophily blockmodel of differences in group tie density

Given what we observed in looking directly at the block densities (shown in figure 18.18), it is not surprising that the constant homophily model does not fit these data well.  We know that two of the groups ("others" and "capitalists") have no apparent tendency to homophily -- and that differs greatly from the "workers" group.  The block model of group differences only accounts for 4.3% of the variance in pair-wise ties; however, permutation trials suggest that this is not a random result (p = .001).

This model only has two parameters, because the hypothesis is proposing a simple difference between the diagonal cells (the within group ties 1-1, 2-2, and 3-3) and all other cells.  The hypothesis is that the densities within these two partitions are the same.  We see that the estimated average tie density of pairs who are not in the same group is .193 -- there is a 19.3% chance that heterogeneous dyads will have a tie.  If the members of the dyad are from the same group, the probability that they share a tie is .196 greater, or .389.

So, although the model of constant homophily does not predict individual's ties at all well, there is a notable overall homophily effect.

We noted that the strong tendency toward within-group ties appears to describe only the "workers" group.  A third block model, labeled Variable Homophily by Tools>Testing Hypotheses>Mixed Dyadic/Nodal>Categorical Attributes>ANOVA Density Models tests the model that each diagonal cell (that is ties within group 1, within group 2, and within group 3) differ from all ties that are not within-group.  Figure 18.20 displays the results.

Figure 18.20.  Variable homophily blockmodel of differences in group tie density

This model fits the data much better (R-square = .269, with p < .000) than the constant homophily model.  It also fits the data nearly as well as the un-restricted structural block model (figure 18.18), but is simpler.

Here, the intercept is the probability that there well be a dyadic tie between any two members of different groups (.193).  We see that the probability of within group ties among group 1 ("others") is actually .05 less than this (but not significantly different).  Within group ties among capitalist interest groups (group 2) are very slightly less common (-.01) than heterogeneous group ties (again, not significant).  Ties among interest groups representing workers (group 3) however, are dramatically more prevalent (.81) than ties within heterogeneous pairs.

In our example, we noted that one group seems to display in-group ties, and others do not.  One way of thinking about this pattern is a "core-periphery" block model.  There is a strong form, and a more relaxed form of the core-periphery structure.

The Core-periphery 1 model supposes that there is a highly organized core (many ties within the group), but that there are few other ties -- either among members of the periphery, or between members of the core and members of the periphery.  Figure 18.21 shows the results of fitting this block model to the California donors data.

Figure 18.21.  "Strong" core-periphery block model of California political donors

It's clear that this model does not do a good job of describing the pattern of within and between group ties.  The R-square is very low (.008), and results this small would occur 39.4% of the time in trials from randomly permuted data.  The (non-significant) regression coefficients show density (or the probability of a tie between two random actors) in the periphery as .27, and the density in the "Core" as .12 less than this.  Since the "core" is, by definition, the maximally dense area, it appears that the output in version 6.8.5 may be mis-labeled.

Core-Periphery 2 offers a more relaxed block model in which the core remains densely tied within itself, but is allowed to have ties with the periphery.  The periphery is, to the maximum degree possible, a set of cases with no ties within their group.  Figure 18.22 shows the results of this model for the California political donors.

Figure 18.22.  "Relaxed" core-periphery block model of California political donors

The fit of this model is better (R-square = .037) but still very poor.  Results this strong would occur about 11% of the time in trials from randomly permuted data.  The intercept density (which we interpret as the "non-periphery") is higher (about 35% of all ties are present), and the probability of a tie between two cases in the periphery is .17 lower.

The homophily hypothesis is often thought of in categorical terms:  is there a tendency for actors who are of the same "type" to be adjacent (or close) to one another in the network?

This idea, though, can be generalized to continuous attributes:  is there a tendency for actors who have more similar attributes to be located closer to one another in a network?

UCINET's  Tools>Testing Hypotheses>Mixed Dyadic?Nodal>Continuous Attributes>Moran/Geary statistics provides two measures that address the question of the "autocorrelation" between actor's scores on interval-level measures of their attributes, and the network distance between them.   The two measures (Moran's I and Geary's C) are adapted for social network analysis from their origins in geography, where they were developed to measure the extent to which the similarity of the geographical features of any two places was related to the spatial distance between them.

Let's suppose that we were interested in whether there was a tendency for political interest groups that were "close" to one another to spend similar amounts of money.  We might suppose that interest groups that are frequent allies may also influence one another in terms of the levels of resources they contribute -- that a sort of norm of expected levels of contribution arises among frequent allies.

Using information about the contributions of very large donors (who gave over a total of $5,000,000) to at least four (of 48) ballot initiatives in California, we can illustrate the idea of network autocorrelation.

First, we create an attribute file that contains a column that has the attribute score of each node, in this case, the amount of total expenditures by the donors.

Second, we create a matrix data set that describes the "closeness" of each pair of actors.  There are several alternative approaches here.  One is to use an adjacency (binary) matrix.  We will illustrate this by coding two donors as adjacent if they contributed funds on the same side of at least four campaigns (here, we've constructed adjacency from "affiliation" data; often we have a direct measure of adjacency, such as one donor naming another as an ally).  We could also use a continuous measure of the strength of the tie between actors as a measure of "closeness."   To illustrate this, we will use a scale of the similarity of the contribution profiles of donors that ranges from negative numbers (indicating that two donors gave money on opposite sides of initiatives) to positive numbers (indicating the number of times the donated on the same side of issues.  One can easily imagine other approaches to indexing the network closeness of actors (e.g. 1/geodesic distance).  Any "proximity" matrix that captures the pair-wise closeness of actors can be used (for some ideas, see Tools>Similarities and Tools>Dissimilarities and Distances ).

Figures 18.23  and 18.24 display the results of Tools>Testing Hypotheses>Mixed Dyadic/Nodal>Continuous Attributes>Moran/Geary statistics where we have examined the autocorrelation of the levels of expenditures of actors using adjacency as our measure of network distance.  Very simply:  do actors who are adjacent in the network tend to give similar amounts of money?  Two statistics and some related information are presented (the Moran statistic in figure 18.23, and the Geary statistic in figure 18.24.

Figure 18.23.  Moran autocorrelation of expenditure levels by political donors with network adjacency

The Moran "I" statistic of autocorrelation (originally developed to measure spatial autocorrelation, but used here to measure network autocorrelation) ranges from -1.0 (perfect negative correlation) through 0 (no correlation) to +1.0 (perfect positive correlation).  Here we see the value of -.119, indicating that there is a very modest tendency for actors who are adjacent to differ more in how much they contribute than two random actors.  If anything, it appears that coalition members may vary more in their contribution levels than random actors -- another hypothesis bites the dust!

The Moran statistic (see any geo-statistics text, or do a Google search) is constructed very much like a regular correlation coefficient.  It indexes the product of the differences between the scores of two actors and the mean, weighted by the actor's similarity - that is, a covariance weighted by the closeness of actors.  This sum is taken in ratio to variance in the scores of all actors from the mean.  The resulting measure, like the correlation coefficient, is a ratio of covariance to variance, and has a conventional interpretation.

Permutation trials are used to create a sampling distribution.  Across many (in our example 1,000) trials, scores on the attribute (expenditure, in this case) are randomly assigned to actors, and the Moran statistic calculated.  In these random trials, the average observed Moran statistic is -.043, with a standard deviation of .073.  The difference between what we observe (-.119) and what is predicted by random association (-.043) is small relative to sampling variability.  In fact, 17.4% of all samples from random data showed correlations at least this big -- far more than the conventional 5% acceptable error rate.

The Geary measure of correlation is calculated and interpreted somewhat differently.  Results are shown in figure 18.24 for the association of expenditure levels by network adjacency.

Figure 18.24.  Geary autocorrelation of expenditure levels by political donors with network adjacency

The Geary statistic has a value of 1.0 when there is no association.  Values less than 1.0 indicate a positive association (somewhat confusingly), values greater than 1.0 indicate a negative association.  Our calculated value of 2.137 indicates negative autocorrelation, just as the Moran statistic did.  Unlike the Moran statistic though, the Geary statistic suggests that the difference of our result from the average of 1,000 random trials (1.004) is statistically significant (p = .026).

The Geary statistic is sometimes described in the geo-statistics literature as being more sensitive to "local" differences than to "global" differences.  The Geary C statistic is constructed by examining the differences between the scores of each pair of actors, and weighting this by their adjacency.  The Moran statistic is constructed by looking at differences between each actor's score and the mean, and weighting the cross-products.  The difference in approach means that the Geary statistic is more focused on how different members of each pair are from each other - a "local" difference; the Moran statistic is focused more on how the similar or dissimilar each pair are to the overall average -- a "global" difference.

In data where the "landscape" of values displays a lot of variation, and non-normal distribution, the two measures are likely to give somewhat different impressions about the effects of network adjacency on similarity of attributes.  As always, it's not that one is "right" and the other "wrong."  It's always best to compute both, unless you have strong theoretical priors that suggest that one is superior for a particular purpose.

Figures 18.25 and 18.26 repeat the exercise above, but with one difference.  In these two examples, we measure the closeness of two actors in the network on a continuous scale.  Here, we've used the net number of campaigns on which each pair of actors were in the same coalition as a measure of closeness.  Other measures, like geodesic distances might be more commonly used for true network data (rather than a network inferred from affiliation).

Figure 18.25.  Moran autocorrelation of expenditure levels by political donors with network closeness

Using a continuous measure of network closeness (instead of adjacency) we might expect a stronger correlation.  The Moran measure is now -.145 ( compared to -.119), and is significant at p = .018.  There is a small, but significant tendency for actors who are "close" allies to give different amounts of money than two randomly chosen actors -- a negative network autocorrelation.

Figure 18.26.  Geary autocorrelation of expenditure levels by political donors with network closeness

The Geary measure has become slightly smaller in size (1.836 versus 2.137) using a continuous measure of network distance.  The result also indicates a negative autocorrelation, and one that would rarely occur by chance if there truly was no association between network distance and expenditure.

The approaches that we've been examining in this section look at the relationship between actor's attributes and their location in a network.  Before closing our discussion of how statistical analysis has been applied to network data, we need to look at one approach that examines how ties between pairs of actors relate to particularly important relational attributes of the actors, and to a more global feature of the graph.

For any pair of actors in a directed graph, there are three possible relationships:  no ties, an asymmetric tie, or a reciprocated tie.  Network>P1  is a regression-like approach that seeks to predict the probability of each of these kinds of relationships for each pair of actors.  This differs a bit from the approaches that we've examined so far which seek to predict either the presence/absence of a tie, or the strength of a tie.

The P1 model (and its newer successor the P* model), seek to predict the dyadic relations among actor pairs using key relational attributes of each actor, and of the graph as a whole.  This differs from most of the approaches that we've seen above, which focus on actor's individual or relational attributes, but do not include overall structural features of the graph (at least not explicitly).

The P1 model consists of three prediction equations, designed to predict the probability of a mutual (i.e. reciprocated) relation (m ij ), an asymmetric relation (a ij ), or a null relation (n ij ) between actors.  The equations, as stated by the authors of UCINET are:

m ij = lambda ij exp(rho+2theta+alpha i +alpha j +â i +â j ) a ij = lambda ij exp(theta+alpha i +beta j )

n ij = lambda ij The first equation says that the probability of a reciprocated tie between two actors is a function of the out-degree (or "expansiveness") of each actor:  alpha i and alpha j .  It is also a function of the overall density of the network (theta).  It is also a function of the global tendency in the whole network toward reciprocity (rho).  The equation also contains scaling constants for each actor in the pair (a i and a j ), as well as a global scaling parameter (lambda).

The second equation describes the probability that two actors will be connected with an asymmetric relation.  This probability is a function of the overall network density (theta), and the propensity of one actor of the pair to send ties (expansiveness, or alpha), and the propensity of the other actor to receive ties ("attractiveness" or beta).

The probability of a null relation (no tie) between two actors is a "residual."  That is, if ties are not mutual or asymmetric, they must be null.  Only the scaling constant "lambda," and no causal parameters enter the third equation.

The core idea here is that we try to understand the relations between pairs of actors as functions of individual relational attributes (individual's tendencies to send ties, and to receive them) as well as key features of the graph in which the two actors are embedded (the overall density and overall tendency towards reciprocity).  More recent versions of the model (P*, P2) include additional global features of the graph such as tendencies toward transitivity and the variance across actors in the propensity to send and receive ties.

Figure 18.27 shows the results of fitting the P1 model to the Knoke binary information network.

Figure 18.27.  Results of P1 analysis of Knoke information network

The technical aspects of the estimation of the P1 model are complicated, and maximum likelihood methods are used.  A G-square (likelihood ratio chi-square) badness of fit statistic is provided, but has no direct interpretation or significance test.

Two descriptive parameters for global network properties are given:

Theta = -1.6882 refers to the effect of the global density of the network on the probability of reciprocated or asymmetric ties between pairs of actors.

Rho = 3.5151 refers to the effect of the overall amount of reciprocity in the global network on the probability of a reciprocated tie between any pair of actors.

Two descriptive parameters are given for each actor (these are estimated across all of the pair-wise relations of each actor):

Alpha ("expansiveness") refers to the effect of each actor's out-degree on the probability that they will have reciprocated or asymmetric ties with other actors.  We see, for example, that the Mayor (actor 5) is a relatively "expansive" actor.

Beta ("attractiveness") refers to the effect of each actor's in-degree on the probability that they will have a reciprocated or asymmetric relation with other actors.  We see here, for example, that the welfare rights organization (actor 6) is very likely to be shunned.

Using the equations, it is possible to predict the probability of each directed tie based on the model's parameters.  These are shown as the "P1 expected values."  For example, the model predicts a 93% chance of a tie from actor 1 to actor 2.

The final panel of the output shows the difference between the ties that actually exist, and the predictions of the model.  The model predicts the tie from actor 1 to actor 2 quite well (residual = .07), but it does a poor job of predicting the relation from actor 1 to actor 9 (residual = .77).

The residuals important because they suggest places where other features of the graph or individuals may be relevant to understanding particular dyads, or where the ties between two actors is well accounted for by basic "demographics" of the network.  Which actors are likely to have ties that are not predicted by the parameters of the model can also be shown in a dendogram, as in figure 18.28.

Figure 18.28.  Diagram of P1 clustering of Knoke information network

Here we see that, for example, that actors 3 and 6 are much more likely to have ties than the P1 model predicts.

In this chapter we've taken a look at some of the most basic and common approaches to applying statistical analysis to the attributes of actors embedded in networks, the relations among these actors, and the similarities between multiple relational networks connecting the same actors.  We've covered a lot of ground.  But, there is still a good bit more, as the application of statistical modeling to network data is one of the "leading edges" of the field of social (and other) network analyses.

There are two main reasons for the interest in applying statistics to what was, originally, deterministic graph theory from mathematics.  First, for very large networks, methods for finding and describing the distributions of network features provide important tools for understanding the likely patterns of behavior of the whole network and the actors embedded in it.  Second, we have increasingly come to realize that the relations we see among actors in a network at a point in time are best seen as probabilistic ("stochastic") outcomes of underlying processes of evolution of networks, and probabilistic actions of actors embedded in those networks.  Statistical methods provide ways of dealing with description and hypothesis testing that take this uncertainty into account.

We've reviewed methods for examining relations between two (or more) graphs involving the same actors.  These tools are particularly useful for trying to understand multi-plex relations, and for testing hypotheses about how the pattern of relations in one whole network relate to the pattern of relations in another.

We've also looked at tools that deal individual nodes.   These tools allow us to examine hypotheses about the relational and non-relational attributes of actors, and to draw correct inferences about relations between variables when the observations (actors) are not independent.

And, we've taken a look at a variety of approaches that relate attributes of actors to their positions in networks.  Much of the focus here is on how attributes may pattern relations (e.g. homophily), or how network closeness of distance may affect similarity of attributes (or vice versa).

Taken together, the marriage of statistics and mathematics in social network analysis has already produced some very useful ways of looking at patterns of social relations.  It is likely that this interface will be one of the areas of most rapid development in the field of social network methods in the coming years.

  • Harvard Business School →
  • Faculty & Research →
  • 20 Mar 2001 - 21 Mar 2001
  • Conference Presentation

QAP: The Quadratic Assignment Procedure

  • Format: Print
  • | Language: English

More from the Author

  • January 2007
  • Journal of Clinical Psychiatry

A Randomized Clinical Trial of EMDR, Fluoxetine and Pill Placebo in the Treatment of PTSD: Treatment Effects and Long-term Maintenance

  • January 2006
  • Journal of Psychosomatic Research

Preliminary Evidence for Parasympathetic Influence on Basal Heart Rate in Posttraumatic Stress Disorder

  • December 2004
  • Journal of Clinical Endocrinology & Metabolism

Prevalence and Incidence of Androgen Deficiency in Middle-Aged and Older Men: Estimates from the Massachusetts Male Aging Study

  • A Randomized Clinical Trial of EMDR, Fluoxetine and Pill Placebo in the Treatment of PTSD: Treatment Effects and Long-term Maintenance  By: Bessel A. van der Kolk, Joseph Spinazzola, Margaret E. Blaustein, James Hopper, Elizabeth Hopper, Deborah Korn and William B. Simpson
  • Preliminary Evidence for Parasympathetic Influence on Basal Heart Rate in Posttraumatic Stress Disorder  By: J. W. Hopper, J. Spinazzola, W. B. Simpson and B. A. van der Kolk
  • Prevalence and Incidence of Androgen Deficiency in Middle-Aged and Older Men: Estimates from the Massachusetts Male Aging Study  By: Andre B. Araujo, Amy B. O'Donnell, Donald J. Brambilla, William Simpson, Christopher Longcope, Alvin M. Matsumoto and John B. McKinlay

Book cover

Combinatorial Programming: Methods and Applications pp 351–360 Cite as

The Quadratic Assignment Problem: A Brief Review

  • E. L. Lawler 3  
  • Conference paper

276 Accesses

1 Citations

Part of the book series: NATO Advanced Study Institutes Series ((ASIC,volume 19))

The quadratic assignment problem, its applications, and various exact and inexact procedures for its solution are briefly reviewed. Certain special cases of the one-dimensional module placement problem, itself a special case of the quadratic assignment problem, are surveyed.

  • Assignment Problem
  • Permutation Matrix
  • Placement Problem
  • Quadratic Assignment Problem
  • Presidential Candidate

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

The preparation of this paper was supported in part by the U.S. Air Force Office of Scientific Research, Grant AFOSR-71–2076.

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Unable to display preview.  Download preview PDF.

D. Adolphson and T.C. Hu, “Optimal Linear Ordering,” SIAM J. Appl. Math. 25 (1973), 403–423.

Article   MathSciNet   MATH   Google Scholar  

P.S. Davis and T.L. Ray, “A Branch-Bound Algorithm for the Capacitated Facilities Location Problem,” Naval Res. Logistics Qtrly 16 (1969), 331–344.

MATH   Google Scholar  

M.A. Efroymson and T.L. Ray, “A Branch-Bound Algorithm for Plant Location,” Opns. Res. 14 (1966), 361–368.

Article   Google Scholar  

J.N. Gavert and N.V. Plyter, “The Optimal Assignment of Facilities to Locations by Branch and Bound,” Opns.Res. 14 (1966), 210–232.

P.C. Gilmore, “Optimal and Sub-optimal Algorithms for the Quadratic Assignment Problem,” SIAM J. Appl. Math. 10 (1962), 305–313.

R.H. Glaser, “A Quasi-Simplex Method for Designing Suboptimal Packages for Electronic Building Blocks,” Proc. 1959 Computer Applic. Symp. , Armour Res. Found., Ill. Inst. Tech., 100–111.

Google Scholar  

R.E. Gomory and T.C. Hu, “Multi-Terminal Network Flows,” SIAM J. Appl. Math. 9 (1961), 551–570.

G.W. Graves and A.B. Whinston, “An Algorithm for the Quadratic Assignment Problem,” Mgt. Sci. 16 (1970), 453–471.

Article   MATH   Google Scholar  

S.A. Graciano, “Branch and Bound Solutions to the Capacitated Plant Location Problem,” Opns. Res. 17 (1969), 1005–1016.

M. Hanan and J.M. Kurtzberg, “A Review of the Placement and Quadratic Assignment Problems,” SIAM Rev. 14 (1972), 324–342.

G. Henry, “Recherche d’un Reseau de Depots Optimum,” Reuve Francaise d’informatique et de Recherche Operationnelle 2 (1968), 61–70.

F.S. Hillier and M.M. Connors, “Quadratic Assignment Problem Algorithms and the Location of Indivisible Facilities,” Mgt. Sci. 13 (1966), 42–57.

W.A. Horn, “Single Machine Job Sequencing with Treelike Precedence Ordering and Linear Delay Penalties,” SIAM J. Appl. Math. 23 (1972), 189–202.

R.M. Karp, “Reducibility Among Combinatorial Problems,” in Complexity of Computer Computations , R.E. Miller and J.W. Thatcher, eds., Plenum Press, N.Y., 1972, 85–104.

R.M. Karp, A.C. McKellar, C.K. Wong, “Near-Optimal Solutions to a 2-Dimensional Placement Problem,” IBM Research Tech. Report RC4740, February 1974.

U.R. Kodres, “Geometrical Positioning of Circuit Elements in a Computer,” Conf. Paper 1172 , AIEE Fall General Mtg., Oct. 1959.

T.C. Koopmans and M.J. Beckmann, “Assignment Problems and the Location of Economic Activities,” Econometrica 25 (1957), 52–75.

Article   MathSciNet   Google Scholar  

A.H. Land, “A Problem of Assignment with Inter-related Costs,” Opnl. Res. Qtrly 14 (1963), 185–199.

E.L. Lawler, “The Quadratic Assignment Problem,” Mgt. Sci. 9 (1963), 586–589.

P.M. Morse, “Optimal Linear Ordering of Information Items,” Opns. Res. 20 (1972), 741–751.

C.E. Nugent, T.E. Vollman and J. Ruml, “An Experimental Comparison of Techniques for the Assignment of Facilities to Locations,” Opns. Res. 16 (1968), 150–173.

V.R. Pratt, “An N log N Algorithm to Distribute N Records Optimally in a Sequential Access File,” in Complexity of Computer Computations , R.E. Miller and J.W. Thatcher, eds., Plenum Press, N.Y., 1972, 111–118.

T.C. Raymond, “A Method for Optimizing Circuit Module Placement,” IBM Tech. Disclosure Bull. 13 (1970), 274–276.

J. Sidney, Ph.D. dissertation, University of Michigan, Ann Arbor, 1971.

K. Spielberg, “Plant Location with Generalized Search Origin,” Mgt. Sci. 16 (1969), 165–178.

L. Steinberg, “The Backboard Wiring Problem: A Placement Algorithm,” SIAM Rev. 3 (1961), 37–50.

Download references

Author information

Authors and affiliations.

Department of Electrical Engineering and Computer Sciences, The University of California, Berkeley, CA, USA

E. L. Lawler

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Université de Paris, France

B. Roy ( Professeur et Conseiller Scientifique ) ( Professeur et Conseiller Scientifique )

SEMA, France

Rights and permissions

Reprints and permissions

Copyright information

© 1995 D. Reidel Publishing Company, Dordrecht-Holland

About this paper

Cite this paper.

Lawler, E.L. (1995). The Quadratic Assignment Problem: A Brief Review. In: Roy, B. (eds) Combinatorial Programming: Methods and Applications. NATO Advanced Study Institutes Series, vol 19. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-7557-9_20

Download citation

DOI : https://doi.org/10.1007/978-94-011-7557-9_20

Publisher Name : Springer, Dordrecht

Print ISBN : 978-94-011-7559-3

Online ISBN : 978-94-011-7557-9

eBook Packages : Springer Book Archive

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

sna Tools for Social Network Analysis

  • add.isolates: Add Isolates to a Graph
  • bbnam: Butts' (Hierarchical) Bayesian Network Accuracy Model
  • bbnam.bf: Estimate Bayes Factors for the bbnam
  • betweenness: Compute the Betweenness Centrality Scores of Network...
  • bicomponent.dist: Calculate the Bicomponents of a Graph
  • blockmodel: Generate Blockmodels Based on Partitions of Network Positions
  • blockmodel.expand: Generate a Graph (or Stack) from a Given Blockmodel Using...
  • bn: Fit a Biased Net Model
  • bonpow: Find Bonacich Power Centrality Scores of Network Positions
  • brokerage: Perform a Gould-Fernandez Brokerage Analysis
  • centralgraph: Find the Central Graph of a Labeled Graph Stack
  • centralization: Find the Centralization of a Given Network, for Some Measure...
  • clique.census: Compute Cycle Census Information
  • closeness: Compute the Closeness Centrality Scores of Network Positions
  • coleman: Coleman's High School Friendship Data
  • component.dist: Calculate the Component Size Distribution of a Graph
  • components: Find the Number of (Maximal) Components Within a Given Graph
  • component.size.byvertex: Get Component Sizes, by Vertex
  • connectedness: Compute Graph Connectedness Scores
  • consensus: Estimate a Consensus Structure from Multiple Observations
  • cugtest: Perform Conditional Uniform Graph (CUG) Hypothesis Tests for...
  • cug.test: Univariate Conditional Uniform Graph Tests
  • cutpoints: Identify the Cutpoints of a Graph or Digraph
  • degree: Compute the Degree Centrality Scores of Network Positions
  • diag.remove: Remove the Diagonals of Adjacency Matrices in a Graph Stack
  • dyad.census: Compute a Holland and Leinhardt MAN Dyad Census
  • efficiency: Compute Graph Efficiency Scores
  • ego.extract: Extract Egocentric Networks from Complete Network Data
  • equiv.clust: Find Clusters of Positions Based on an Equivalence Relation
  • eval.edgeperturbation: Compute the Effects of Single-Edge Perturbations on...
  • evcent: Find Eigenvector Centrality Scores of Network Positions
  • event2dichot: Convert an Observed Event Matrix to a Dichotomous matrix
  • flowbet: Calculate Flow Betweenness Scores of Network Positions
  • gapply: Apply Functions Over Vertex Neighborhoods
  • gclust.boxstats: Plot Statistics Associated with Graph Clusters
  • gclust.centralgraph: Get Central Graphs Associated with Graph Clusters
  • gcor: Find the (Product-Moment) Correlation Between Two or More...
  • gcov: Find the Covariance(s) Between Two or More Labeled Graphs
  • gden: Find the Density of a Graph
  • gdist.plotdiff: Plot Differences in Graph-level Statistics Against...
  • gdist.plotstats: Plot Various Graph Statistics Over a Network MDS
  • geodist: Fund the Numbers and Lengths of Geodesics Among Nodes in a...
  • gilschmidt: Compute the Gil-Schmidt Power Index
  • gliop: Return a Binary Operation on GLI Values Computed on Two...
  • gplot: Two-Dimensional Visualization of Graphs
  • gplot3d: Three-Dimensional Visualization of Graphs
  • gplot3d.arrow: Add Arrows a Three-Dimensional Plot
  • gplot3d.layout: Vertex Layout Functions for gplot3d
  • gplot3d.loop: Add Loops to a Three-Dimensional Plot
  • gplot.arrow: Add Arrows or Segments to a Plot
  • Browse all...

qaptest : Perform Quadratic Assignment Procedure (QAP) Hypothesis Tests... In sna: Tools for Social Network Analysis

View source: R/gtest.R

Perform Quadratic Assignment Procedure (QAP) Hypothesis Tests for Graph-Level Statistics

Description.

qaptest tests an arbitrary graph-level statistic (computed on dat by FUN ) against a QAP null hypothesis, via Monte Carlo simulation of likelihood quantiles. Note that fair amount of flexibility is possible regarding QAP tests on functions of such statistics (see an equivalent discussion with respect to CUG null hypothesis tests in Anderson et al. (1999)). See below for more details.

The null hypothesis of the QAP test is that the observed graph-level statistic on graphs G_1,G_2,… was drawn from the distribution of said statistic evaluated (uniformly) on the set of all relabelings of G_1,G_2,… . Pragmatically, this test is performed by repeatedly (randomly) relabeling the input graphs, recalculating the test statistic, and then evaluating the fraction of draws greater than or equal to (and less than or equal to) the observed value. This accumulated fraction approximates the integral of the distribution of the test statistic over the set of unlabeled input graphs.

The qaptest procedure returns a qaptest object containing the estimated likelihood (distribution of the test statistic under the null hypothesis), the observed value of the test statistic on the input data, and the one-tailed p-values (estimated quantiles) associated with said observation. As usual, the (upper tail) null hypothesis is rejected for significance level alpha if p>=observation is less than alpha (or p<=observation, for the lower tail); if the hypothesis is undirected, then one rejects if either p<=observation or p>=observation is less then alpha/2. Standard caveats regarding the use of null hypothesis testing procedures are relevant here: in particular, bear in mind that a significant result does not necessarily imply that the likelihood ratio of the null model and the alternative hypothesis favors the latter.

In interpreting a QAP test, it is important to bear in mind the nature of the QAP null hypothesis. The QAP test should not be interpreted as evaluating underlying structural differences; indeed, QAP is more accurately understood as testing differences induced by a particular vertex labeling controlling for underlying structure. Where there is substantial automorphism in the underling structures, QAP will tend to given non-significant results. (In fact, it is impossible to obtain a one-tailed significance level in excess of max_[g in {G,H}] |Aut(g)|/|Perm(g)| when using a QAP test on a bivariate graph statistic f(G,H) , where Aut(g) and Perm(g) are the automorphism and permutation groups on g, respectively. This follows from the fact that all members of Aut(g) will induce the same values of f() .) By turns, significance under QAP does not necessarily imply that the observed structural relationship is unusual relative to what one would expect from typical structures with (for instance) the sizes and densities of the graphs in question. In contexts in which one's research question implies a particular labeling of vertices (e.g., "within this group of individuals, do friends also tend to give advice to one another"), QAP can be a very useful way of ruling out spurious structural influences (e.g., some respondents tend to indiscriminately nominate many people (without regard to whom), resulting in a structural similarity which has nothing to do with the identities of those involved). Where one's question does not imply a labeled relationship (e.g., is the shape of this group's friendship network similar to that of its advice network), the QAP null hypothesis is inappropriate.

An object of class qaptest , containing

Carter T. Butts [email protected]

Anderson, B.S.; Butts, C.T.; and Carley, K.M. (1999). “The Interaction of Size and Density with Graph-Level Indices.” Social Networks , 21(3), 239-267.

Hubert, L.J., and Arabie, P. (1989). “Combinatorial Data Analysis: Confirmatory Comparisons Between Sets of Matrices.” Applied Stochastic Models and Data Analysis , 5, 273-325.

Krackhardt, D. (1987). “QAP Partialling as a Test of Spuriousness.” Social Networks , 9 171-186.

Krackhardt, D. (1988). “Predicting With Networks: Nonparametric Multiple Regression Analyses of Dyadic Data.” Social Networks , 10, 359-382.

Related to qaptest in sna ...

R package documentation, browse r packages, we want your feedback.

quadratic assignment procedure

Add the following code to your website.

REMOVE THIS Copy to clipboard

For more information on customizing the embed code, read Embedding Snippets .

COMMENTS

  1. Quadratic assignment problem

    The Quadratic Assignment Problem (QAP), discovered by Koopmans and Beckmann in 1957 ... Branch and Bound Procedures. The Branch and Bound method was first proposed by Ailsa Land and Alison Doig in 1960 and is the most commonly used tool for solving NP-hard optimization problems.

  2. Quadratic assignment problem

    The quadratic assignment problem (QAP) is one of the fundamental combinatorial optimization problems in the branch of optimization or operations research in mathematics, from the category of the facilities location problems first introduced by Koopmans and Beckmann.. The problem models the following real-life problem: There are a set of n facilities and a set of n locations.

  3. PDF William Simpson Harvard Business School

    QAP is a procedure that uses permutations of the data set to estimate standard errors for regression models with non-independent observations. Learn how QAP works, how it differs from other methods, and how to use it in Stata with examples and slides.

  4. Quadratic Assignment Problem (QAP)

    The Quadratic Assignment Problem (QAP) is an optimization problem that deals with assigning a set of facilities to a set of locations, considering the pairwise distances and flows between them. The problem is to find the assignment that minimizes the total cost or distance, taking into account both the distances and the flows. The distance ...

  5. PDF The Quadratic Assignment Problem

    to the following description of a QAP as quadratic integer program. 2.1 Quadratic Integer Program Formulation Using permutation matrices instead of permutations, the QAP ((2) can be formulated as the following integer program with quadratic objective function (hence the name Quadratic Assignment Problem by Koopmans and Beckmann [113]). min Xn i ...

  6. The Quadratic Assignment Procedure (QAP)

    Learn how to use the QAP, a resampling-based method for calculating correct standard errors for paired observations, in Stata. The QAP command allows running any estimation command using QAP samples and is similar to the bootstrap command.

  7. Quadratic Assignment Problem

    The quadratic assignment problem (QAP) is a combinatorial optimization problem, that although there is a substantial amount of ... Frieze A, Kaplan H (1996) A new rounding procedure for the assignment problem with applications to dense graph arrangement problems. In: Proc. 37-th Annual IEEE Symp. Foundations of Computer Sci. (FOCS). ...

  8. The Quadratic Assignment Problem

    The quadratic assignment problem (QAP) is considered one of the most difficult optimization problems to solve optimally. The QAP is a combinatorial optimization problem stated for the first time by Koopmans and Beckmann ().Early papers on the subject include Gilmore (), Pierce and Crowston (), Lawler (), and Love and Wong ().The problem is defined as follows.

  9. The Quadratic Assignment Problem

    The quadratic assignment problem (QAP) was introduced by Koopmans and Beckmann in 1957 as a mathematical model for the location of a set of indivisible economical activities [113]. ... Y. Li, P. M. Pardalos, and M. G. C. Resende, A greedy randomized adaptive search procedure for the quadratic assignment problem, in Quadratic Assignment and ...

  10. The Quadratic Assignment Procedure (QAP)

    The quadratic assignment procedure (QAP), which is commonly used in social network analysis, is a resampling-based method, similar to the bootstrap, for calculating the correct standard errors. This talk explains the QAP algorithm and describes the -qap- command, with syntax similar to -bstrap- command, which implements the quadratic assignment ...

  11. Quadratic Assignment Procedure (QAP) in statnet

    Quadratic Assignment Procedure (QAP) Quadratic assignment procedure (QAP) is similar to CUG, in that it simulation in order to generate a distribution of hypothetical networks. But QAP controls for network structure, as compared with CUG, which controls for size, the number of edges, or dyad census. (Note: CUG can be made to condition on other ...

  12. Quadratic Assignment Procedure

    ^qap^ implements the quadratic assignment procedure, a simulation-based method for determining confidence intervals for parameter estimates when the data set is dyadic. It generates ^reps()^ QAP samples and runs the user-defined program ^progname^ on each sample. The original data set should represent a square matrix of pairwise observations.

  13. Intro to Network Regression (MRQAP)

    The Quadratic Assignment Procedure (QAP) is an extension of the Mantel Test in a regression framework, developed in the field of psychometrics. The Multiple Regression Quadratic Assignment Procedure (MRQAP) is an extension of this approach to allow for multiple covariate matrices (Krackhardt 1988). Essentially, MRQAP allows you to determine the ...

  14. R: Perform Quadratic Assignment Procedure (QAP) Hypothesis Tests

    Perform Quadratic Assignment Procedure (QAP) Hypothesis Tests for Graph-Level Statistics Description. qaptest tests an arbitrary graph-level statistic (computed on dat by FUN) against a QAP null hypothesis, via Monte Carlo simulation of likelihood quantiles.Note that fair amount of flexibility is possible regarding QAP tests on functions of such statistics (see an equivalent discussion with ...

  15. Introduction to social network analysis: Chapter 18: Some statistical tools

    Tools>Testing Hypotheses>Dyadic (QAP)>QAP Correlation calculates measures of nominal, ordinal, and interval association between the relations in two matrices, and uses quadratic assignment procedures to develop standard errors to test for the significance of association. Figure 18.8 shows the results for the correlation between the Knoke ...

  16. PDF Predicting With Networks: Nonparametric Multiple Regression Analysis of

    This paper argues that the quadratic assignment procedure (QAP) is uperior to OLS for testing hypotheses in both simple andmultiple regression models based on yadic ata, such as found in network analysis. A model of autocorrelation is proposed that is consistent with assumptions e of dyadic data.

  17. QAP: The Quadratic Assignment Procedure

    Simpson, William B. "QAP: The Quadratic Assignment Procedure." Paper presented at the North American Stata Users' Group Meeting, March 20-21, 2001.

  18. The Quadratic Assignment Problem: A Brief Review

    Abstract. The quadratic assignment problem, its applications, and various exact and inexact procedures for its solution are briefly reviewed. Certain special cases of the one-dimensional module placement problem, itself a special case of the quadratic assignment problem, are surveyed.

  19. How to compute Quadratic Assignment Procedure using R?

    First, graph.edgelist is a function from the igraph package; sna won't recognize it. Either send it adjacency matrices or use the network function as.network to create networks sna recognizes. Second, you need to send qaptest a list--qaptest (list (g, g1, g2), gcor, g1 = 1, g2 = 2). Right now you are sending it a single network (g).

  20. Navigating the Range of Statistical Tools for Inferential Network

    The Quadratic Assignment Procedure. The quadratic assignment procedure (QAP) is a nonparametric test for the significance of an association between two matrices with complex dependencies (Hubert and Schultz 1976). The QAP is not a statistical model but an add-on for standard regression models that provides an unbiased test of association in ...

  21. Multiple Regression Quadratic Assignment Procedure

    In the first article they propose the double semi-partialization method to extend the QAP (quadratic assignment procedures) correlations to multiple regression. This method performs permutations of the residuals to compare an observed statistic with the values obtained after the re-labeling of the data. In the second, he proposes to use robust ...

  22. qaptest: Perform Quadratic Assignment Procedure (QAP) Hypothesis Tests

    Perform Quadratic Assignment Procedure (QAP) Hypothesis Tests for Graph-Level Statistics Description. qaptest tests an arbitrary graph-level statistic (computed on dat by FUN) against a QAP null hypothesis, via Monte Carlo simulation of likelihood quantiles.Note that fair amount of flexibility is possible regarding QAP tests on functions of such statistics (see an equivalent discussion with ...

  23. Quadratic Assignment Procedure (QAP) in R is producing different

    I am trying to run a quadratic assignment procedure (QAP) on correlations of behaviors between a community of five individuals. I have ten matrices that represent frequencies of behavior between individuals, and I calculated correlations (pearson's r) between pairs of matrices. For example, I found the correlation between matrix 1 and matrix 2 ...