Difference between revisions of "User:Zeracles/Starmap/Minimal Spanning Tree"

From Ultronomicon
Jump to navigation Jump to search
(Boruvka's algorithm is faster, at least the way I've done it)
m (→‎Code: adjust links)
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
I originally coded up [[wikipedia:Prim's_algorithm|Jarnik/Prim's algorithm]]. In the process of optimising it, I unwittingly reinvented [[wikipedia:Bor%C5%AFvka%27s_algorithm|Boruvka's algorithm]], which handles large numbers of input points better, at least the way I've done it.
+
I originally coded up [[wikipedia:Prim's_algorithm|Jarnik/Prim's algorithm]]. In the process of optimising it, I unwittingly reinvented [[wikipedia:Bor%C5%AFvka%27s_algorithm|Boruvka's algorithm]], which handles large numbers of input points better, at least the way I've done it. Demonstration [http://www.star-control.com/forum/index.php/topic,1660.msg35209.html#msg35209 here].
==Boruvka's Algorithm==
+
==Minimally Connected Opaque Friend Graph==
Also downloadable [http://www.physics.usyd.edu.au/~asmith/files/sc1/MST_Boruvka.m here]
+
The matlab/octave code below for Boruvka's algorithm also contains an option to create a Minimally Connected Opaque Friend Graph (MCOFG). This is a graph I invented for the purpose of adding links to a completed MST to get closed loops, which were present in the original Star Control 1 starmaps. Actually, the MCOFG has another very interesting use in my real life work, but that's another story.
% minimal spannng tree in matlab
+
 
% Boruvka's algorithm (independently invented by yours truly)
+
The basic properties of the graph are
% see http://www.star-control.com/forum/index.php/topic,1660
+
*all the points are connected
% by Anthony Smith (Zeracles)
+
*each point is directly linked to ``nearby" points
% astroant@hotmail.com
+
*no individual link occupies a path in space similar to that traced by other link(s)
+
It is ``minimal" because only close neighbours are connected, and with ``opaque friends" because links can't go ``through" nearby points (the ``friends"). Graph geeks will realise that the Delaunay Tessellation (which we also considered) seems to satisfy these criteria, but it has edge effects; it will produce a convex hull at the edges, linking points that may be nowhere near each other, meaning that it only partially satisfies the ``nearby" criterion. The MCOFG is also far easier to implement than the delaunay tessellation! The MST is a good starting point in the construction of this graph, because it guarantees the connection of all points. It is also used to define ``nearby" and thus ``friends", by using the length of the longest MST link hosted at any given node to define a radius within which links to additional points will be considered.
% if only after an MST, leave separation=pruning=0
+
 
+
So, to generate the graph, we first generate an MST. For each point, we examine its MST links, and remember the length of the longest one. Points within this distance around the point are considered local, and potential friends. To determine whether each of these potential friends should be linked to the point, we consider all points that are closer. If r_2 is the distance to the point we are considering linking, and r_1 is a point closer to the point we are finding friends for, and \theta is the angle between vectors '''r_2''' and '''r_1''', we compute the product
load('inputpositions')
+
<br>((r_2)/(2r_1))*((\pi/2)/(\theta))
+
<br>If it's less than one, the link '''r_2''' is allowed. When there are many points closer to consider, the products for each of them are multiplied together, and if the result if less than one, the link is allowed. This approach is motivated by the third condition for this graph, that no individual link occupies a path in space similar to that traced by other link(s). If one has points 1, 2, and 3, in that order on a straight line, 1 should not be linked to 3, because this duplicates the paths 1 to 2 and 2 to 3. With our approach, 1 will not be linked to 3 because 2 is closer to one, and the angle \theta is zero, so the product will be very much greater then one.
localcount=8;
+
 
+
The motivation for the inclusion of the distance component (r_2/r_1) is that if two points subtend a small angle \theta, but are at similar distances, the link to the more distant one shouldn't be broken. As for normalisation, the factors of two in the angular and distance components are there so that we can actually get values of less than one.
separation=0;
+
 
separationlength=0.03;
+
There are also two numbers that can be tweaked to vary the connectedness of the graph.
+
The product is divided by ``friendliness" f, i.e.
pruning=0;
+
<br>((r_2)/(2r_1))*((\pi/2)/(\theta))*(1/f)
prunesteps=1; % how many times to have a go at the tree
+
<br>to vary the number of links. Higher values of f lead to more connections, more friends. The locality factor l influences the radius within which points are considered for friendship, which is by default the length of the longest local MST link. l = 2, say, would double this.
prunelevel=10; % length of branch to cut each time
+
 
+
==Efficiency Method==
% detect input properties and set up index
+
In discussion on the forum Death 999 contributed a method that has become our preferred method for closed loop generation - I'll insert a description when I get a moment. It is implemented in both versions of the code available below.
sizeinputpositions=size(inputpositions);
+
 
lengthinputpositions=sizeinputpositions(1);
+
==Code==
dimensions=sizeinputpositions(2);
+
When prototyping, I wrote a [http://www.physics.usyd.edu.au/~asmith/files/sc1/MSTclosure.m matlab/octave version available here], and have recently done a [http://www.physics.usyd.edu.au/~asmith/files/sc1/MSTclosure.c C conversion available here].
 
index=transpose(linspace(1,lengthinputpositions,lengthinputpositions));
 
 
inputpositionxs=inputpositions(:,1);
 
inputpositionys=inputpositions(:,2);
 
if dimensions>2
 
inputpositionzs=inputpositions(:,3);
 
end
 
 
% initialise search parameters
 
minx=min(inputpositionxs);
 
maxx=max(inputpositionxs);
 
xextent=minx*maxx;
 
 
miny=min(inputpositionys);
 
maxy=max(inputpositionys);
 
yextent=miny*maxy;
 
 
if dimensions>2
 
minz=min(inputpositionzs);
 
maxz=max(inputpositionzs);
 
zextent=minz*maxz;
 
end
 
 
if dimensions==2
 
measure=xextent*yextent;
 
localmeasure=measure/lengthinputpositions;
 
searchincrement=((localcount*localmeasure)/pi)^(1/2);
 
end
 
if dimensions==3
 
measure=xextent*yextent*zextent;
 
localmeasure=measure/lengthinputpositions;
 
searchincrement=((localcount*localmeasure)/(4/3)/pi)^(1/3);
 
end
 
 
treenumber=index;
 
numbertrees=length(unique(treenumber));
 
branches=[];
 
searchfactor=1; % setting this to zero can lead to infinite loop if this is scaled multiplicatively
 
searchfactors=zeros(lengthinputpositions,1);
 
 
while numbertrees>1 % each input point is treated as a separate tree, and we join them together until we only have one tree
 
searchfactor=2*searchfactor; % calculate interpoint distances for ``close" points, with ``close" defined by this variable
 
while min(searchfactors)<searchfactor % searchfactors stores the searchfactors searched for each existing tree
 
tree=treenumber(min(find(searchfactors<(searchfactor)))); % choose a tree to search which has been searched with the smallest distance so far
 
treesearched=0;
 
while treesearched==0 % while we're not yet done looking around the nodes of this tree for links
 
maskvector=treenumber==tree;
 
treeindices=find(maskvector);
 
lengthtreepositions=sum(maskvector);
 
 
remainingpositionxs=inputpositionxs(find(maskvector==0));
 
remainingpositionys=inputpositionys(find(maskvector==0));
 
if dimensions>2
 
remainingpositionzs=inputpositionzs(find(maskvector==0));
 
end
 
 
remainingpositionindices=index(find(maskvector==0));
 
 
mindistances=[];
 
for i=linspace(1,lengthtreepositions,lengthtreepositions) % search around each of the tree positions for remainingpositions
 
% retrieve coordinates of a tree position
 
treepositionx=inputpositions(treeindices(i),1);
 
treepositiony=inputpositions(treeindices(i),2);
 
if dimensions>2
 
treepositionz=inputpositions(treeindices(i),3);
 
end
 
treeindex=treeindices(i);
 
 
% search for remainingpositions
 
boxradius=searchfactor*searchincrement;
 
 
boxminx=treepositionx-boxradius;
 
boxmaxx=treepositionx+boxradius;
 
boxminy=treepositiony-boxradius;
 
boxmaxy=treepositiony+boxradius;
 
if dimensions>2
 
boxminz=treepositionz-boxradius;
 
boxmaxz=treepositionz+boxradius;
 
end
 
 
if dimensions==2
 
closerremainingmask=(boxminx<=remainingpositionxs).*(remainingpositionxs<=boxmaxx).*(boxminy<=remainingpositionys).*(remainingpositionys<=boxmaxy);
 
end
 
if dimensions==3
 
closerremainingmask=(boxminx<=remainingpositionxs).*(remainingpositionxs<=boxmaxx).*(boxminy<=remainingpositionys).*(remainingpositionys<=boxmaxy).*(boxminz<=remainingpositionzs).*(remainingpositionzs<=boxmaxz);
 
end
 
 
mindistancecandidates=[];
 
mindistancecandidateindices=[];
 
if sum(closerremainingmask)>0 % if there are remainingpositions inside the box, calculate the actual distances to them
 
for k=transpose(find(closerremainingmask==1));
 
branchcandidatex=remainingpositionxs(k);
 
branchcandidatey=remainingpositionys(k);
 
if dimensions>2
 
branchcandidatez=remainingpositionzs(k);
 
end
 
branchcandidateindex=remainingpositionindices(k); % set up to remember the index of each prospective link destination
 
 
if dimensions==2
 
mindistancecandidate=((branchcandidatex-treepositionx).^2+(branchcandidatey-treepositiony).^2).^(1/2);
 
end
 
if dimensions==3
 
mindistancecandidate=((branchcandidatex-treepositionx).^2+(branchcandidatey-treepositiony).^2+(branchcandidatez-treepositionz).^2).^(1/2);
 
end
 
if mindistancecandidate<=boxradius
 
mindistancecandidates=[mindistancecandidates;mindistancecandidate];
 
mindistancecandidateindices=[mindistancecandidateindices;branchcandidateindex];
 
end
 
end
 
end
 
if numel(mindistancecandidates)>0 % minimum distance from this position on the tree
 
mindistance=[min(mindistancecandidates),treeindex,mindistancecandidateindices(find(mindistancecandidates==min(mindistancecandidates)))]; % minimum distances carry the indices of the prospective link destination
 
mindistances=[mindistances;mindistance];
 
end
 
end
 
if numel(mindistances)==0
 
searchfactors(maskvector)=searchfactor;
 
treesearched=1;
 
end
 
if numel(mindistances)>0
 
branchdistance=min(mindistances(:,1));
 
branchindex=find(branchdistance==mindistances(:,1));
 
numberbranchcandidates=numel(branchindex);
 
if numberbranchcandidates>1 % if there's a tie for minimum length
 
branchindex=branchindex(ceil(numberbranchcandidates*rand(1)));
 
end
 
 
% add this branch to the tree
 
branch=mindistances(branchindex,:);
 
branches=[branches;branch];
 
 
% join the two trees together
 
targettree=treenumber(branch(3)); % need to update the treenumber variable, which remembers which points belong to which trees
 
treenumber(find(treenumber==targettree))=tree;
 
numbertrees=numbertrees-1; % so that we are one step closer to finishing
 
end
 
end
 
end
 
end
 
 
% do custom stuff with this tree
 
% separation - remove all branches above a certain length
 
if separation==1
 
separationmask=(branches(:,1))<=separationlength;
 
branches=branches(find(separationmask),:);
 
end
 
 
% pruning - remove wandering branches
 
if pruning==1
 
degrees=[];
 
for i=linspace(1,lengthinputpositions,lengthinputpositions)
 
degree=(sum((branches(:,2))==i))+(sum((branches(:,3))==i));
 
degrees=[degrees;degree];
 
end
 
 
while prunesteps>0
 
protectednodes=index(find(degrees>2));
 
 
tempprunelevel=prunelevel;
 
while tempprunelevel>0
 
ends=index(find(degrees==1)); % find branch end nodes
 
for i=transpose(ends)
 
if sum(i==protectednodes)==0
 
if (degrees(i))>0
 
deletemask=((branches(:,2))==i)+((branches(:,3))==i);
 
cutbranch=branches(find(deletemask),:);
 
 
branches=branches(find(deletemask==0),:);
 
degrees(cutbranch(2))=(degrees(cutbranch(2)))-1;
 
degrees(cutbranch(3))=(degrees(cutbranch(3)))-1;
 
end
 
end
 
end
 
tempprunelevel=tempprunelevel-1;
 
end
 
prunesteps=prunesteps-1;
 
end
 
end
 
 
save('branches','branches')
 
 
% for plotting
 
sizebranches=size(branches);
 
numberofbranches=sizebranches(1);
 
 
figure
 
hold on
 
box on
 
if dimensions==2
 
plot(inputpositions(:,1),inputpositions(:,2),'k.')
 
end
 
if dimensions==3
 
plot3(inputpositions(:,1),inputpositions(:,2),inputpositions(:,3),'k.')
 
end
 
 
for i=linspace(1,numberofbranches,numberofbranches)
 
if dimensions==2
 
startindex=branches(i,2);
 
finishindex=branches(i,3);
 
 
startx=inputpositions(startindex,1);
 
starty=inputpositions(startindex,2);
 
 
finishx=inputpositions(finishindex,1);
 
finishy=inputpositions(finishindex,2);
 
 
line([startx;finishx],[starty;finishy],'Color','k')
 
end
 
if dimensions==3
 
startindex=branches(i,2);
 
finishindex=branches(i,3);
 
 
startx=inputpositions(startindex,1);
 
starty=inputpositions(startindex,2);
 
startz=inputpositions(startindex,3);
 
 
finishx=inputpositions(finishindex,1);
 
finishy=inputpositions(finishindex,2);
 
finishz=inputpositions(finishindex,3);
 
 
line([startx;finishx],[starty;finishy],[startz;finishz],'Color','k')
 
end
 
end
 
==Jarnik/Prim's Algorithm==
 
Also downloadable [http://www.physics.usyd.edu.au/~asmith/files/sc1/MST_JarnikPrim.m here]
 
% minimal spannng tree in matlab
 
% Jarnik/Prim's algorithm - runs pretty slowly for any more than 50 input points
 
% see http://www.star-control.com/forum/index.php/topic,1660.msg33938.html#msg33938
 
% by Anthony Smith (Zeracles)
 
% astroant@hotmail.com
 
load('inputpositions')
 
 
remainingpositions=inputpositions;
 
 
% detect input properties and set up index
 
sizeremainingpositions=size(remainingpositions);
 
lengthremainingpositions=sizeremainingpositions(1);
 
dimensions=sizeremainingpositions(2);
 
 
index=transpose(linspace(1,lengthremainingpositions,lengthremainingpositions));
 
 
remainingpositionxs=remainingpositions(:,1);
 
remainingpositionys=remainingpositions(:,2);
 
if dimensions>2
 
remainingpositionzs=remainingpositions(:,3);
 
end
 
 
% initialise search parameters
 
minx=min(remainingpositionxs);
 
maxx=max(remainingpositionxs);
 
xextent=minx*maxx;
 
 
miny=min(remainingpositionys);
 
maxy=max(remainingpositionys);
 
yextent=miny*maxy;
 
 
if dimensions>2
 
minz=min(remainingpositionzs);
 
maxz=max(remainingpositionzs);
 
zextent=minz*maxz;
 
end
 
 
if dimensions==2
 
measure=xextent*yextent;
 
localmeasure=measure/lengthremainingpositions;
 
searchincrement=((10*localmeasure)/pi)^(1/2);
 
end
 
if dimensions==3
 
measure=xextent*yextent*zextent;
 
localmeasure=measure/lengthremainingpositions;
 
searchincrement=((10*localmeasure)/(4/3)/pi)^(1/3);
 
end
 
 
% initialise with a seed point
 
seedindex=1;
 
maskvector=zeros(lengthremainingpositions,1);
 
maskvector(seedindex)=1;
 
 
remainingpositionxs=remainingpositionxs(find(maskvector==0));
 
remainingpositionys=remainingpositionys(find(maskvector==0));
 
if dimensions>2
 
remainingpositionzs=remainingpositionzs(find(maskvector==0));
 
end
 
remainingpositionindices=index(find(maskvector==0));
 
 
lengthremainingpositions=lengthremainingpositions-1;
 
 
treeindices=index(seedindex,:);
 
lengthtreepositions=1;
 
 
branches=[];
 
 
while lengthremainingpositions>0 % keep going until nothing remains unlinked to the tree (the ``remainingpositions")
 
mindistances=[];
 
for i=linspace(1,lengthtreepositions,lengthtreepositions) % search around each of the tree positions for remainingpositions
 
% retrieve coordinates of a tree position
 
treepositionx=inputpositions(treeindices(i),1);
 
treepositiony=inputpositions(treeindices(i),2);
 
if dimensions>2
 
treepositionz=inputpositions(treeindices(i),3);
 
end
 
treeindex=treeindices(i);
 
 
occupiedbox=0;
 
searchfactor=1;
 
while occupiedbox==0 % search for remainingpositions inside a box that grows until some are found
 
boxradius=searchfactor*searchincrement;
 
 
boxminx=treepositionx-boxradius;
 
boxmaxx=treepositionx+boxradius;
 
boxminy=treepositiony-boxradius;
 
boxmaxy=treepositiony+boxradius;
 
if dimensions>2
 
boxminz=treepositionz-boxradius;
 
boxmaxz=treepositionz+boxradius;
 
end
 
 
if dimensions==2
 
closerremainingmask=(boxminx<=remainingpositionxs).*(remainingpositionxs<=boxmaxx).*(boxminy<=remainingpositionys).*(remainingpositionys<=boxmaxy);
 
end
 
if dimensions==3
 
closerremainingmask=(boxminx<=remainingpositionxs).*(remainingpositionxs<=boxmaxx).*(boxminy<=remainingpositionys).*(remainingpositionys<=boxmaxy).*(boxminz<=remainingpositionzs).*(remainingpositionzs<=boxmaxz);
 
end
 
 
mindistancecandidates=[];
 
mindistancecandidateindices=[];
 
if sum(closerremainingmask)>0 % if there are remainingpositions inside the box, calculate the actual distances to them
 
for k=find(closerremainingmask==1)
 
branchcandidatex=remainingpositionxs(k);
 
branchcandidatey=remainingpositionys(k);
 
if dimensions>2
 
branchcandidatez=remainingpositionzs(k);
 
end
 
branchcandidateindex=remainingpositionindices(k); % set up to remember the index of each prospective link destination
 
 
if dimensions==2
 
mindistancecandidate=((branchcandidatex-treepositionx).^2+(branchcandidatey-treepositiony).^2).^(1/2);
 
end
 
if dimensions==3
 
mindistancecandidate=((branchcandidatex-treepositionx).^2+(branchcandidatey-treepositiony).^2+(branchcandidatez-treepositionz).^2).^(1/2);
 
end
 
if mindistancecandidate<=boxradius
 
mindistancecandidates=[mindistancecandidates;mindistancecandidate];
 
mindistancecandidateindices=[mindistancecandidateindices;branchcandidateindex];
 
end
 
end
 
end
 
if numel(mindistancecandidates)>0
 
mindistance=[min(mindistancecandidates),treeindex,mindistancecandidateindices(find(mindistancecandidates==min(mindistancecandidates)))]; % minimum distances carry the indices of the prospective link destination
 
mindistances=[mindistances;mindistance];
 
 
occupiedbox=1;
 
end
 
searchfactor=searchfactor+1; % make the search box bigger
 
end
 
end
 
branchdistance=min(mindistances(:,1));
 
branchindex=find(branchdistance==mindistances(:,1));
 
numberbranchcandidates=numel(branchindex);
 
if numberbranchcandidates>1 % if there's a tie for minimum length
 
branchindex=branchindex(ceil(numberbranchcandidates*rand(1)));
 
end
 
 
% add this branch to the tree
 
branch=mindistances(branchindex,:);
 
branches=[branches;branch];
 
 
% add that location to the tree
 
treeindices=[treeindices;branch(3)];
 
lengthtreepositions=lengthtreepositions+1;
 
 
% remove that location from the remainingpositions
 
remainingmask=ones(lengthremainingpositions,1);
 
removed=find(remainingpositionindices==branch(3));
 
remainingmask(removed)=0;
 
 
remainingpositionindices=remainingpositionindices(find(remainingmask));
 
remainingpositionxs=remainingpositionxs(find(remainingmask));
 
remainingpositionys=remainingpositionys(find(remainingmask));
 
if dimensions>2
 
remainingpositionzs=remainingpositionzs(find(remainingmask));
 
end
 
 
lengthremainingpositions=lengthremainingpositions-1;
 
end
 
 
save('branches','branches')
 
 
% for plotting
 
sizebranches=size(branches);
 
numberofbranches=sizebranches(1);
 
 
figure
 
hold on
 
box on
 
plot(inputpositions(:,1),inputpositions(:,2),'k.')
 
 
for i=linspace(1,numberofbranches,numberofbranches)
 
startindex=branches(i,2);
 
finishindex=branches(i,3);
 
 
startx=inputpositions(startindex,1);
 
starty=inputpositions(startindex,2);
 
 
finishx=inputpositions(finishindex,1);
 
finishy=inputpositions(finishindex,2);
 
 
line([startx;finishx],[starty;finishy],'Color','k')
 
end
 

Latest revision as of 06:22, 1 September 2013

I originally coded up Jarnik/Prim's algorithm. In the process of optimising it, I unwittingly reinvented Boruvka's algorithm, which handles large numbers of input points better, at least the way I've done it. Demonstration here.

Minimally Connected Opaque Friend Graph[edit]

The matlab/octave code below for Boruvka's algorithm also contains an option to create a Minimally Connected Opaque Friend Graph (MCOFG). This is a graph I invented for the purpose of adding links to a completed MST to get closed loops, which were present in the original Star Control 1 starmaps. Actually, the MCOFG has another very interesting use in my real life work, but that's another story.

The basic properties of the graph are

  • all the points are connected
  • each point is directly linked to ``nearby" points
  • no individual link occupies a path in space similar to that traced by other link(s)

It is ``minimal" because only close neighbours are connected, and with ``opaque friends" because links can't go ``through" nearby points (the ``friends"). Graph geeks will realise that the Delaunay Tessellation (which we also considered) seems to satisfy these criteria, but it has edge effects; it will produce a convex hull at the edges, linking points that may be nowhere near each other, meaning that it only partially satisfies the ``nearby" criterion. The MCOFG is also far easier to implement than the delaunay tessellation! The MST is a good starting point in the construction of this graph, because it guarantees the connection of all points. It is also used to define ``nearby" and thus ``friends", by using the length of the longest MST link hosted at any given node to define a radius within which links to additional points will be considered.

So, to generate the graph, we first generate an MST. For each point, we examine its MST links, and remember the length of the longest one. Points within this distance around the point are considered local, and potential friends. To determine whether each of these potential friends should be linked to the point, we consider all points that are closer. If r_2 is the distance to the point we are considering linking, and r_1 is a point closer to the point we are finding friends for, and \theta is the angle between vectors r_2 and r_1, we compute the product
((r_2)/(2r_1))*((\pi/2)/(\theta))
If it's less than one, the link r_2 is allowed. When there are many points closer to consider, the products for each of them are multiplied together, and if the result if less than one, the link is allowed. This approach is motivated by the third condition for this graph, that no individual link occupies a path in space similar to that traced by other link(s). If one has points 1, 2, and 3, in that order on a straight line, 1 should not be linked to 3, because this duplicates the paths 1 to 2 and 2 to 3. With our approach, 1 will not be linked to 3 because 2 is closer to one, and the angle \theta is zero, so the product will be very much greater then one.

The motivation for the inclusion of the distance component (r_2/r_1) is that if two points subtend a small angle \theta, but are at similar distances, the link to the more distant one shouldn't be broken. As for normalisation, the factors of two in the angular and distance components are there so that we can actually get values of less than one.

There are also two numbers that can be tweaked to vary the connectedness of the graph. The product is divided by ``friendliness" f, i.e.
((r_2)/(2r_1))*((\pi/2)/(\theta))*(1/f)
to vary the number of links. Higher values of f lead to more connections, more friends. The locality factor l influences the radius within which points are considered for friendship, which is by default the length of the longest local MST link. l = 2, say, would double this.

Efficiency Method[edit]

In discussion on the forum Death 999 contributed a method that has become our preferred method for closed loop generation - I'll insert a description when I get a moment. It is implemented in both versions of the code available below.

Code[edit]

When prototyping, I wrote a matlab/octave version available here, and have recently done a C conversion available here.