Buddy AutoMagic - using network flow to connect students

December 22, 2013

A program to pair up people with respect to multiple weighted criteria, e.g. interests, language skills, age. The interface is a Google Spreadsheet and the pairing is solved using a network flow algorithm. This is a project I initiated in collaboration with the International Relations Office at my university. The original intention for the program was to simplify the organization of the university’s “Buddy program”. However, its use can be extended to pair up two groups of virtually any type.

#Motivation

The original idea comes from the problem of pairing incoming international students with local “buddies” (at some places called mentors or tutors). A local buddy volunteers to assist his exchange buddy in getting settled at the host university as well as to the local community. Sometimes, a local buddy and his exchange buddy also become good friends. Now, say you would ideally want every local buddy to come from the same faculty as his exchange buddy, be of similar age and speak a common language with his buddy. For example, if we have local students

name     age studies          speaks
Björn    27  engineering      english, french
Ólafur   22  computer science english, swedish
Hallveig 22  arts             english, spanish

and international students

name     age studies          speaks
Irene    25  arts             english, spanish
Jonas    22  engineering      english, swedish
Sara     30  computer science english, french

we could pair them up like this

local    int.   age1 age2 studies1 studies2 speaks1         speaks2
Björn    Sara   27   30   eng      cs       eng., french    eng., french
Ólafur   Jonas  22   22   cs       eng      eng., swedish   eng., swedish
Hallveig Irene  22   25   arts     arts     eng., spanish   eng., spanish

and roughly be able to meet every criterion. We are only swapping the engineering and computer science students (nothing severe).

Now, imagine doing this for over 100 students, which is a situation that comes up almost every semester for most universities that participate in student mobility programs (e.g. Erasmus, Conahec, Nordplus). This turns out to become fairly complicated since the number of ways to perform the pairing grows factorially, i.e. very fast.

#Demo

(edit 2016: the demo is still live but it does not run anymore)

I’ve made a small live preview of Buddy AutoMagic available on this Google spreadsheet. All the functionality of the program can be accessed through the Affinity menu (requires you to be logged in with your Google account).

The data is made up and shows how the system can work in real life. Each sheet has a specific purpose

Assignments is where the pairing appears
Locals and Internationals is where the data resides
Question Importances is where you configure the weight of each question
Affinity is where you see the ‘affinity’ score of all possible pairs
Variables is where you configure the question columns to dictionary keys in the underlying code

Unfortunately if you want to run the demo yourself, you will have to give the program permission to read all of your emails. This is not entirely necessary for the demo to run, but a useful feature if you wish to notify people of their buddy assignment.

If you are curious to know more how it works please contact me. I would be very excited to see it put to use in more places! The universities which have adopted it so far are Reykjavik University (my uni) and NTNU in Trondheim, Norway.

#What is happening in the background?

The general idea to solving this problem is that if we can quantify the similarity or affinity between each possible pair, we can find a pairing that maximises the outcome for everyone.

To quantify the affinity between two persons, we assign weights to each criterion. For instance, we could add 5 points if both students come from the same faculty, 5 points for each common language they speak and also 5 points if their age is less than three years apart. Continuing with our previous example, we would obtain table like this

         Irene Jonas Sara
Björn    10    15    15
Ólafur   5     15    10
Hallveig 20    10    5

A similar table can be seen on the Affinity sheet in the demo above.

Now, an optimal pairing is clearly one that selects the maximum value for every row. In this case it’s 15 + 15 + 20 = 50 which is the same pairing as we figured out above.

One might quickly assume that this turns out the be the result in most cases and an optimal solution is to simply greedily pick the highest values first. A small counter-example is sufficient to show why this is not a clever idea.

  C  D
A 10 9
B 8  2

By picking A-C we may get 10 points but we are also forced to pick B-D which only yields for us 2 points. An optimal pairing on the other hand, A-D and B-C, yields a whooping total score of 17 points!

#Maximum weighted bipartite matching

If we formulate the problem as here above, our optimal pairing lies in finding a maximum weighted matching on a complete bipartite graph. On one side we have the locals and the international students on the other side. We then assign the affinity score for each pair to be the cost of the edge connecting them. We leave the capacity for each student to be only one, if we wish to pair one local student with multiple international students we can do multiple runs.

Now, we can use this min cost max flow algorithm to solve the problem.

The running time is O(|V|^4 * MAX_EDGE_COST) which one might think to be too slow for any reasonably sized pairing. It turns out however that the algorithm finds a pairing within just a few seconds when V=200 and MAX_EDGE_COST=1000. I’m assuming one reason for this is that we may be cutting some running time by having all edge capacities carry the value 1, i.e. we are in fact only finding the minimum cost, not max flow. Also, it may be that the upper bound is not very tight.

It would be interesting to analyze the running time further. We will let this do for now since it serves out purpose well anyways.

#Alternative algorithms

It is worth mentioning that we did also look at some other algorithms.

#Hungarian algorithm

The [Hungarian algorithm][hungarian] is the most efficient algorithm for solving a maximum weighted bipartite matching. On the other hand, it also happens to be relatively complicated to implement. For that reason, we chose to stick with the previous algorithm. [Hungarian]: https://en.wikipedia.org/wiki/Hungarian_algorithm

#Stable marriage

The first algorithm we looked was the famous Gale and Shapley algorithm for solving the stable marriage (or matching) problem. This seemed to be ideal at first sight. However, we then realised that would require us to flatten the similarity scores down to a preference list. For instance, with the previous table, we would convert it into

         Irene Jonas Sara
Björn    3     (1,2) (1,2)    # Ambiguous
Ólafur   3     1     2
Hallveig 1     2     3

This has a couple of disadvantages

It’s ambiguous and,
a lot of useful information (e.g. scores 10, 9 and 2 tell us more than 1, 2 and 3) is left out.

For these two reasons, we decided to look for a better solution.

#Future work

In the future, I’m hoping to convert the Buddy AutoMagic scripts into a library. I am also looking to extend the collection of questions that might be useful in domains other than exchange student buddy programs.

Moreover, it would be interesting to research different variations of this problem. For instance, the case where we pair people from the same group (i.e. complete graph instead of bipartite) or if we wish to create teams of k people instead of only pairs of two.

#Acknowledgements

I would first of all like to thank the wonderful ladies at the international relations office for allowing me to try out this experiment on the actual Reykjavik University Buddy Program. I would also like to extend my thanks to Bjarki Ágúst, a fellow student, and Ýmir Vigfússon, my computer security professor, who helped me figure out which algorithm to use. Bjarki in addition ported the above min cost max flow C++ code into Javascript.