Buddy AutoMagic - using network flow to connect students
A program to pair up people with respect to multiple weighted criteria, e.g. interests, language skills, age. The interface is a Google Spreadsheet and the pairing is solved using a network flow algorithm. This is a project I initiated in collaboration with the International Relations Office at my university. The original intention for the program was to simplify the organization of the university's "Buddy program". However, its use can be extended to pair up two groups of virtually any type.
Motivation
The original idea comes from the problem of pairing incoming international students with local "buddies" (at some places called mentors or tutors). A local buddy volunteers to assist his exchange buddy in getting settled at the host university as well as to the local community. Sometimes, a local buddy and his exchange buddy also become good friends. Now, say you would ideally want every local buddy to come from the same faculty as his exchange buddy, be of similar age and speak a common language with his buddy. For example, if we have local students
name age studies speaks
Björn 27 engineering english, french
Ólafur 22 computer science english, swedish
Hallveig 22 arts english, spanish
and international students
name age studies speaks
Irene 25 arts english, spanish
Jonas 22 engineering english, swedish
Sara 30 computer science english, french
we could pair them up like this
local int. age1 age2 studies1 studies2 speaks1 speaks2
Björn Sara 27 30 eng cs eng., french eng., french
Ólafur Jonas 22 22 cs eng eng., swedish eng., swedish
Hallveig Irene 22 25 arts arts eng., spanish eng., spanish
and roughly be able to meet every criterion. We are only swapping the engineering and computer science students (nothing severe).
Now, imagine doing this for over 100 students, which is a situation that comes up almost every semester for most universities that participate in student mobility programs (e.g. Erasmus, Conahec, Nordplus). This turns out to become fairly complicated since the number of ways to perform the pairing grows factorially, i.e. very fast.
Demo
(edit 2016: the demo is still live but it does not run anymore)
I've made a small live preview of Buddy AutoMagic available on this Google
spreadsheet. All the functionality of the program can be accessed through
the Affinity
menu (requires you to be logged in with your Google account).
The data is made up and shows how the system can work in real life. Each sheet has a specific purpose
Assignments
is where the pairing appearsLocals
andInternationals
is where the data residesQuestion Importances
is where you configure the weight of each questionAffinity
is where you see the 'affinity' score of all possible pairsVariables
is where you configure the question columns to dictionary keys in the underlying code
Unfortunately if you want to run the demo yourself, you will have to give the program permission to read all of your emails. This is not entirely necessary for the demo to run, but a useful feature if you wish to notify people of their buddy assignment.
If you are curious to know more how it works please contact me. I would be very excited to see it put to use in more places! The universities which have adopted it so far are Reykjavik University (my uni) and NTNU in Trondheim, Norway.
What is happening in the background?
The general idea to solving this problem is that if we can quantify the similarity or affinity between each possible pair, we can find a pairing that maximises the outcome for everyone.
To quantify the affinity between two persons, we assign weights to each criterion. For instance, we could add 5 points if both students come from the same faculty, 5 points for each common language they speak and also 5 points if their age is less than three years apart. Continuing with our previous example, we would obtain table like this
Irene Jonas Sara
Björn 10 15 15
Ólafur 5 15 10
Hallveig 20 10 5
A similar table can be seen on the Affinity
sheet in the demo above.
Now, an optimal pairing is clearly one that selects the maximum value for every row. In this case it's 15 + 15 + 20 = 50 which is the same pairing as we figured out above.
One might quickly assume that this turns out the be the result in most cases and an optimal solution is to simply greedily pick the highest values first. A small counter-example is sufficient to show why this is not a clever idea.
C D
A 10 9
B 8 2
By picking A-C
we may get 10 points but we are also forced to pick B-D
which
only yields for us 2 points. An optimal pairing on the other hand, A-D
and
B-C
, yields a whooping total score of 17 points!
Maximum weighted bipartite matching
If we formulate the problem as here above, our optimal pairing lies in finding a maximum weighted matching on a complete bipartite graph. On one side we have the locals and the international students on the other side. We then assign the affinity score for each pair to be the cost of the edge connecting them. We leave the capacity for each student to be only one, if we wish to pair one local student with multiple international students we can do multiple runs.
Now, we can use this min cost max flow algorithm to solve the problem.
The running time is O(|V|^4 * MAX_EDGE_COST)
which one might think to be too
slow for any reasonably sized pairing. It turns out however that the algorithm
finds a pairing within just a few seconds when V=200
and MAX_EDGE_COST=1000
.
I'm assuming one reason for this is that we may be cutting some running time by
having all edge capacities carry the value 1, i.e. we are in fact only finding
the minimum cost, not max flow. Also, it may be that the upper bound is not very
tight.
It would be interesting to analyze the running time further. We will let this do for now since it serves out purpose well anyways.
Alternative algorithms
It is worth mentioning that we did also look at some other algorithms.
Hungarian algorithm
The [Hungarian algorithm][hungarian] is the most efficient algorithm for solving a maximum weighted bipartite matching. On the other hand, it also happens to be relatively complicated to implement. For that reason, we chose to stick with the previous algorithm. [Hungarian]: https://en.wikipedia.org/wiki/Hungarian_algorithm
Stable marriage
The first algorithm we looked was the famous Gale and Shapley algorithm for solving the stable marriage (or matching) problem. This seemed to be ideal at first sight. However, we then realised that would require us to flatten the similarity scores down to a preference list. For instance, with the previous table, we would convert it into
Irene Jonas Sara
Björn 3 (1,2) (1,2) # Ambiguous
Ólafur 3 1 2
Hallveig 1 2 3
This has a couple of disadvantages
- It's ambiguous and,
- a lot of useful information (e.g. scores 10, 9 and 2 tell us more than 1, 2 and 3) is left out.
For these two reasons, we decided to look for a better solution.
Future work
In the future, I'm hoping to convert the Buddy AutoMagic scripts into a library. I am also looking to extend the collection of questions that might be useful in domains other than exchange student buddy programs.
Moreover, it would be interesting to research different variations of this problem. For instance, the case where we pair people from the same group (i.e. complete graph instead of bipartite) or if we wish to create teams of k people instead of only pairs of two.
Acknowledgements
I would first of all like to thank the wonderful ladies at the international relations office for allowing me to try out this experiment on the actual Reykjavik University Buddy Program. I would also like to extend my thanks to Bjarki Ágúst, a fellow student, and Ýmir Vigfússon, my computer security professor, who helped me figure out which algorithm to use. Bjarki in addition ported the above min cost max flow C++ code into Javascript.