Talk:Planet landing risk-reward formula

From Ultronomicon
Jump to navigation Jump to search

What a joy to find smoeone else who thinks this way.

Some Plots

Interesting, Fyzix. I've just run some (well, two) quick tests. At this place I've put the results along with the little matlab script which did the job (I love matlab, easy and powerful, which is a good combination, I'm glad I haven't had to do anything so numerically insane as to require fortran). So if anyone wants to toy with this little script, hopefully they have matlab or work at some institution which has paid an exorbitant amount to subscribe for it ;) Feel free to ask here or on my talk page or email me for more info on how to use it, but I put some comments in, and in truth it isn't very complicated.

So, here's what I did (I think, hopefully there are no mistakes). The script populates a rectangle with a uniform random distribution of points, and for each point, it works out what the distance to the closest point is. Then it averages all these to find the mean (closest) separation. The script has three modes. Mode 1 does this once. Mode 2 does this several times with different (user-specified) numbers of points. The result of this test can be seen here, called varynumber.jpg. Strong dependence throughout the range. Perhaps we could fit a power law through it. The second test I did was with the script in mode 3, which varies the size of the rectangle (at present this is just a square, though). The result of this can be found here, called varysize.jpg. So this investigates Fyzix's point above. Strong dependence again throughout the range, looks linear.

In view of the latter result, Fyzix is right that uniform distribution is possibly not a great assumption, and while this wasn't unexpected, we've been motivated by a need to keep the formula tractable. But perhaps we could use the data from the game's source and run something like this script on each planet? A quick note on something which I'm not sure has been pointed out, we would have to subtract the number of return trips from the number of distances. Finding the number of return trips (I don't know, number_landings) might be the first step of the formula. But then we still get some extra time when the lander doesn't land exactly where it's supposed to, so subtract number_landings*separation_mean and add number_landings*distance_landingerror perhaps to get the distance (and therefore time, hence hazard) estimate. On a very hostile planet with valuable resources, one can be reduced to landing for just one deposit at a time, taking separation_mean out of the hazard calculation. But this variable would still have been important in the first place for dictating such a mode of operation originally.

Worthy points about crew_bailout there, Fyzix, I agree about how crew_bailout would feature. I was wondering about factoring in the cost of the lander too, but haven't wondered very much. I think we should put a term in for it. It'd be interesting to see how often it is of consequence. And that's a good point about not needing to worry about the human factor (phew).

Of course, this script doesn't work out what happens for the geometry of a planet in sc2, with the wrap-around. Also it doesn't give the total time as Fyzix was saying, for that one would have to multiply the mean separation by number. So, the dependence in varynumber.jpg gets reduced, and the dependence in varysize.jpg gets increased (underscores your thoughts against assuming uniform random distribution, Fyzix).

Happy to take suggestions on any similar investigations. Maybe I'll look into doing total time directly. PsiPhi, you made your own starmap? Me too, ah, the memories, I'd guess many of us did! --Zeracles 01:05, 21 October 2007 (CEST)

Losing Landers

When thinking about factoring in the cost of the lander, we need to think about the probability of losing it (this isn't really new, for factoring in the cost of crew, we use frequency of hazards which is related to probability per unit time of hazard striking). For thinking about this, I guess one could think about what has to happen for a lander to be lost. Hazards have to conspire to destroy the lander within a time such that the player doesn't have time to react (I'll just say time_reaction for now). We could call this conspiracy event_loss. The probability of event_loss increases as the lander loses crew. The probability of losing the lander, then, is the integral of this probability over the time the lander spends on the surface (so we just sum this probability, which is changing, over the lander's time on the surface). Sound fair? I'll go on if no-one wants to add anything. --Zeracles 01:44, 30 October 2007 (CET)

I know I said in my edit of october 18 (when speculating about what crew_bailout would have to be) that it probably wasn't a good idea to go down the path of thinking about Poisson processes, but now I'm having second thoughts.

I haven't looked at the code, but presumably this is how the hazards work. This just means that the probability per unit time of a hazard occurring is constant. A good example of this is particles from radioactive decay, or things popping up in hyperspace within a sphere of influence (I assume, but Slylandro probes are not a good example, see fact 2 here). We want know what the probability of event_loss (see above) is. Presumably (and someone could check this), the rates of the hazards are characterised by the probability of them occurring per unit time such that for each unit of time there can be said to be an expected number of occurrences. Then the probability of actually getting some number of occurrences may be found by using the first equation here (this tells you how likely lambda occurrences are when you expect k).

So let's assume for a simple example that on some planet there is only one hazard, and it does one damage every time it happens. Its occurrence is a poisson process. The chance of losing the lander when it has x crew is the chance of the hazard striking x times within time_reaction. We can find this probability (by using that equation I mentioned) because we know how often the hazard usually strikes within time_reaction (probably because we just looked at what "class" the planet was for this particular hazard).

The punch line is that this simple example is straightforwardly generalised to many hazard types simultaneously, and by multiplying by how much these hurt. I'll go further. Now that we know this, we can find an optimal crew_bailout. The reward for staying on the surface is the value of the next reward. From this we would subtract the likely crew losses and (probability of losing the lander)*(lander cost). Probability of losing the lander increases of course as one loses crew. So we subtract this risk from the reward and if our answer is positive then it's a good idea to stay on the surface. crew_bailout is when this goes negative - its value depends on the distances between goodies, hazards and lander cost.

I don't know how complicated that sounds, but I don't think it would be difficult to implement, there are some fiddly issues of course. I think the part about the lander cost influencing crew_bailout is intuitive - whether or not to bailout is going to depend on how worried you are about losing the lander, and this depends on how much it costs.

Also a quick note that it isn't just the cost of the lander, it's also the cost of losing all the things you've picked up. People are still interested in this right (I'm happy to explain anything which seems confusing)? This is interesting enough for me to go on regardless, though! --Zeracles 22:49, 3 November 2007 (CET)

I am, of course, still interested, but as I said earlier, my skills in this are not the best. This is why I never proposed an actual equation, even rudimentary. I figured a person (or a group) out there with better knowledge and understanding would be able to work on this and create something usable and sensible. I thank you for all your time and effort, and I look forward to further discussion. I'm hoping someone (perhaps you Z.) will start to elaborate on the equation beyond what little I currently have. If not on the article page, at least here, proposing how it might look, giving possible examples, so that the less inclined, such as myself, might have a better idea of where your thoughts are going, and what it might look like. Thanks. --PsiPhi 23:20, 5 November 2007 (CET)
While I still find this interesting, I'm going to have to respectfully bow out of any in-depth discussion as my real life grindstone (ie grad school/research) has become at least an order of magnitude more complex and involved. That said, I'll probably still throw in a couple pennies here and there, but honestly I think Zeracles has the best set of tools to tackle the nitty-gritty of this problem. On a more related tangent, I think you're onto a very important subtlety of the hazard of losing a lander - losing all the resources that would already have been collected, which can drastically reduce the reward of the planet. On the other hand this might be equivalent to increasing the net cost since one alternative to avoid this drastic loss is to do multiple, single mineral node runs, such that instead of the lost lander cost, you get a larger overall landing cost. Anyways, looking at your post below, I would say that this could easily fit into the iterative, computational solution you propose. I'd be interested to see at what values this strategy becomes more cost effective than the traditional one. Okay, enough rambling from the person who just recused himself from doing the heavy lifting - whatever you come up with in the end I'm sure would be much better than what I could conceive. Have fun, good luck, happy hunting, break a leg, win awards and all that jazz. --Fyzixfighter 06:35, 7 November 2007 (CET)

State of Play

Fair enough my friend, I wasn't getting annoyed or anything, I was just wondering if people were starting to consider this too hard to take seriously. This is a fun problem. Here I will consider many of the things which have appeared on this page (a review article!), and elucidate my humble thoughts.

PsiPhi's excellent idea is about a choice, whether or not to land on a potentially hostile planet. This choice could, one supposes, be made via a formula bearing a true or false (1 or 0) outcome. However, we have agreed that it will be best to have positive and negative outcomes, where positive is good, negative is "don't go there" and zero is "break even". Fyzix has cleverly suggested that the outcomes of such a formula need not labour the player's skills overmuch, since skilled players will be the ones who are game enough to tackle planets with the lower values, et cetera. Devices won't be considered, as they have plot-advancing value (as Valaggar pointed out), and effectively have infinite value I guess.

Fyzix proposed a first version of the formula in his edit of october 18, weighting hazards by their frequencies. However, we note that some parts of the formula may be conditional; we may need to think a bit more about the control structure. This takes nothing away from Fyzix's suggestion of course.

Some testing I've done (prompted by Fyzix) suggests that assuming an even distribution of minerals isn't a good idea. The value of the variable crew_bailout has been an object of some speculation, and I've suggested a way of finding this which factors in the cost of the lander. At this point, I think progress is encouraging and I agree with PsiPhi that there are enough pieces of this for us to begin assembling a sequel to Fyzix's first version. --Zeracles 16:16, 6 November 2007 (CET)

Pronumerals

So, I think we should begin by adopting some pronumerals for these variables, to make this easier on the eyes. I've noticed that variables have been scattered throughout this page, and it would be good to just be able to go to one place to find the variable definitions. Also a place where we can change them at will, I'll list some of these below (starting with Fyzix's variables!), and if anyone has more appropriate names, just change them. Also, we can add more variables here as they become necessary. --Zeracles 16:23, 6 November 2007 (CET)

General

  • RR=risk-reward=an array of RU outcomes for exploiting a planet or moon with repective strategies - can take positive or negative values
  • S=strategy=strategy space=a vector-valued quantity which describes strategy - alternatively, this may be thought of as a space which contains the strategies which might be employed by a player - individual positions (vectors) correspond to individual strategies

Rewards

  • M_r=mineral_reward=total RU value of all the minerals (duh)
  • m_r=average mineral reward per deposit (the above divided by number of deposits)
  • n_m=number of mineral deposits
  • M_p=total mineral parts, i.e. how much space they take up in the lander
  • m_p=average mineral parts (the above divided by number of deposits)
  • B_r=bio_reward=total bio value or (bio value)*40 to convert to RUs
  • b_r=average bio value per lifeform (the above divided by the number of lifeforms)

Costs (and cost related)

  • l_c=landing_cost=(fuel to land)*(times to collect everything (may differ if collecting bios))*20 RUs
  • n_l=number_landings=number of landings required to collect everything
  • d_m=mean separation between minerals
  • t_m=time_mineral=not a cost, but dependent on the distribution of minerals
  • t_b=time_bio=dependent on distribution, total hit points, and mobility (and maybe attitude) of the bios
  • H=hazard=time*(frequency)*crew cost RUs (I'm guessing the class #/temperature determines the frequency of the hazards. The frequency then is how many of that type of hazard the lander will encounter each second.)
  • d_n=damage_node=damage sustained during a single mineral node run
  • b_h=bio_hazard=time*(danger weighted by mobility and attitude)*crew cost RUs
  • t_sl=time_safelanding=time spent on the surface when filling lander to capacity, ignoring hazards
  • c_b=crew_bailout=when crew gets to here, bail out
  • t_b=time taken to reach crew_bailout
  • c_p=crew_pre-emptivebailout=if after picking up something, crew gets to here, return to the ship rather than staying on the surface to get more

Lander Parameters

  • L_$=lander cost
  • L_s=lander speed
  • L_c=lander capacity
  • d_le=distance_landingerror=how much on average the lander misses landing point

Control Structure

I think the main question here is the number of landings, and whether this is determined by the capacity of the lander, or by the hazards. On a planet which is relatively safe, the former will be true, and the latter when it's dangerous. I've been thinking about a mathematical way of doing this, and I don't see an easy way (to the rigour which I've been looking for at least). What I suggest we do is trial and error.

Tell our formula to do rewards minus risks on the assumption that we ignore the hazards no matter how dangerous they are. Include all rewards and hazards, as well as the risk and cost of losing the lander (on a dangerous planet, the result may be that the lander gets destroyed pretty well every time). From this we will get an RU estimate of the risk/reward. Then tell it to iteratively try the same thing with each possible crew bailout. We then have several RU estimates. Take the maximum of these, and there's our answer.

I realise this could be seen as a really cheap tactic to solve the problem, but what this does is turn a computational problem into a more mathematical one, to my mind (i.e., my suggestion is to do everything and take the best option rather than do this conditionally). I'm having fun with this problem, cheers everyone! --Zeracles 16:31, 6 November 2007 (CET)

Strategy Space

That's a shame (re Fyzix's edit under Losing Landers). At least an order of magnitude! You know, I'm only part time these days, because those sorts of things happen. I think it's a very underrated undertaking, to work full time, or am I just too laid back? It's no coincidence that I recently switched to part time and started editing the ultronomicon a bit later, and life has been so much better. I couldn't be paid enough to give this up. In what we do, Fyzix, research, I found the situation intolerable. There are few things worse than having to tackle tricky/tough research problems while under the pressure of time, under the pressure of your supervisor wanting to publish. And research problems are fun (like this planet lander problem) if you are allowed to take a step back and really think about them, potentially not so if you aren't. I have a bit of story here - at the end of my honours project last year, my supervisor told me that I could go ahead and write a paper. What ended up happening was, because I was part time and had enough time to think a little, I was able to come up with a new algorithm for characterising galaxy distributions. Fast forward to now, we've just got a referee's report on this paper, and the referee reckons that this algorithm is the strongest point of the paper, and it is the basis of my work at the moment. It goes to show that in research at least, it can really pay to take it slow if you see something which you'd like to think about a bit more, despite the constant pressure to publish.

I think you're more experienced on the research path than myself, Fyzix, so this is probably nothing new :)

But back to business - with my suggestion about iteratively trying strategies, this is like having a "strategy space". One could think of the RU estimates which we get as a way of choosing an optimal strategy from this space. In my last edit I really only proposed one variable for this space. This is so because what I said reduces to the following: land, collect stuff until one of two things happens:

  • lander capacity is reached
  • crew_bailout is reached

The case I spoke of where one simply ignores the hazards is equivalent to setting crew_bailout = 0. So, because lander capacity isn't a variable, our strategy space has just one dimension.

It isn't this simple, and Fyzix has just pointed this out by citing the case of deciding to do "multiple, single mineral node runs", in his words. I'll put this in perspective - once one has picked up something and has enough space for more, does one get more and risk losing the lander, crew and what has already been gained, or return to the ship and attempt to land close to the next piece? One can generalise what Fyzix is thinking of here to the possibility of not just single mineral node runs, but perhaps to "some integer" node runs.

How to put this in our strategy space? Putting ourselves in the player's place, this amounts to, I think, a kind of pre-emptive crew_bailout. I'll call this c_p. I've added it to the list of variables, but -

c_p=crew_pre-emptivebailout=if after picking up something, crew gets to here, return to the ship rather than staying on the surface to get more

This c_p becomes another variable in our strategy space, which is now two-dimensional (it's kind of degenerate where crew_bailout is greater than c_p). c_p = 0 corresponds to the strategy where one isn't thinking about doing this at all, and where one is only prepared to return to the ship when crew reaches crew_bailout or lander capacity has been reached.

Effectively, coming up with this formula now has two parts.

  • come up with all the possible strategies
  • figure out what happens for each of them

I may be silent for the next two days or so as I need to prepare to chair a research meeting, but I think this is open for people to think of other strategies to add here. If anyone would like to propose them, I can think of how to "parametrise" them and work them into the strategy space. --Zeracles 10:50, 7 November 2007 (CET)

Modeling What Happens

This may not be the most exciting SC development for all of us, heh, but I have written a (matlab) code which models what happens for the strategies of a given strategy space, which can be found here (I couldn't help myself!).

I'll briefly describe what the code does (I'll list all the assumptions made below). First define dimensions of the planet. Then define the "catalogue" of stuff on the surface. Say something about hazards. The variable "inputmode" determines whether this is input from maybe the game's source or something we're making up for testing purposes. Then define our strategy space and the values which we will sample. Then input reward and cost values (rewards will come from catalogue data when we do this for real). Under "FUN WITH DISTANCES" we take all the coordinates of all the catalogue positions, find the nearest distance to the nearest other position for each position, average these (the same thing I described in my edit of october 21). Lastly we get to the simulation bit. The process here is to try all the strategies of our strategy space and see what happens. So here is a nested for loop ("for" every strategy). The exact details of this will vary as the various assumptions of the code are ticked off. Rather than spam this page with every detail of the changes I make, I'll give more detail on this little web page (which you can see now for details of precisely what the code does at the moment), maybe making notes on the list which I'll put together below. --Zeracles 21:30, 12 November 2007 (CET)

Oh and I just want to say that I will illustrate all this with a nice example to make it accessible to all (because preliminary results are promising!), so uhh, maybe wait until I've done this. While you're waiting, maybe read this story or something. --Zeracles 21:39, 12 November 2007 (CET)

I have now put an example of this here, ask me if you have any questions. --Zeracles 22:53, 16 November 2007 (CET)

I've done some variations on this example here. I demonstrate that with this approach we could take into account changing mineral abundance, hazards and melnorme tech. I realise that I haven't really demostrated yet that this iterative approach is really better than a conditional formula, but I think we are starting to see effects which would be troublesome to do with a formula (I mean the areas with same pixel values) as this gets more precise. --Zeracles 23:17, 20 November 2007 (CET)

Below I'll list the assumptions which the code makes (not an exhaustive list), I'll probably add more, cross off and modify them as I go. The code is on that web page; I'm happy to plough through these, but of course if any of you have any suggestions, they are welcome. This doesn't list the strategies which still haven't been listed fully, or the information we would need from the game source (this would be a separate list, which people can feel free to start below this one). --Zeracles 06:04, 22 November 2007 (CET)

  • crew costs nothing
  • only one hazard type
  • geometry of a planet is square
  • no wrap-around for a planet, so the edges can't be crossed
  • ship can carry infinite number of landers, infinite crew, infinite fuel
  • zero turning time
  • all mineral deposits are the same
  • no bios
  • need to work in a proper time thing, i.e., crew losses are a subtraction of hazard frequency from crew
  • hazards are treated only as frequencies (which is probably fine until crew gets low)

And while I'm in the mood for lists, here's a list of the things we need to know from the game source (again, I don't think this is all). Again I can look into this myself, but anyone feel free to have a look, as I know nothing about the source. Remember, I'm happy to do it all but at the same time I'm not trying to monopolise this discussion, there is still room for suggestions on strategies to build into strategy space, modeling what happens and the blanks listed below. --Zeracles 05:31, 23 November 2007 (CET)

  • how do the hazards behave?
  • are the hazard frequencies governed by poisson processes?
  • how do the "classes", temperature et cetera affect how bad the hazards are?
  • how does gravity affect how much fuel it takes to land?
  • what are the precise rules for how much they hurt?
  • and how does this change with melnorme tech?
  • what are the lander parameters?
  • speed?
  • turning rate?
  • firing rate?
  • and how do these change with melnorme tech?
  • how do weather and/or gravity affect error in landing position?
  • how do the bios behave, exactly?
  • what are the dimensions of a planet (I just mean width and height in the box on the screen)?
  • and obviously, all the raw data for each planet and moon

Formula Construction

Basic elements

This iterative approach was supposed to be a way of not having to come up with a formula. But having seen how the risk-reward can be so easily visualised, and realising that we can come up with a formula which fits what we see, it looks like we could construct a formula in parallel with the code development, making the formula more and more complex as we do likewise with the simulation. The code will be an easy way of making sure we don't get it wrong (I seem to be laying out a lot of things without actually doing anything!). Now, as soon as I learn how to make equations in wiki . . . --Zeracles 05:09, 30 November 2007 (CET)

I begin setting this up below - the latex version looks nicer.

For any given planet or moon along with the attributes of the lander depending on melnorme tech, one can build up a map of what will probably happen for each possible strategy in strategy space. Performing this analysis leads to a definite conclusion on two things.

  • The ``risk-reward" in RU for the planet - an indication of how worthwhile landing on the planet might be. This is the value of the brightest pixel on the map in strategy space. This is really a relative indicator - whether or not a player should land will still depend on that player's skill.
  • The best strategy for the conditions, indicated not by the value, but by the location of the brightest pixel in strategy space.

Now we will put this in mathematical terms. Suppose we are dealing with a two-dimensional strategy space defined by, as in first example, crew bailout (cb) and pre-emptive crew bailout (cp). Let (i,j) be an index which identifies each strategy, with i labeling cb and j labeling cp. The RU value of the risk-reward (RR) for a particular strategy defined by the pair (i,j) can be denoted by RRi,j. Thus, the RU risk-reward for a strategy in which one bails when crew is down to 5 in any conditions, and when one bails when crew is down to 10 immediately after collecting a deposit is RR5,10. We can call the corresponding strategy Si,j.

Now we can express the two items in the above list using this formalism. The risk-reward for a planet is max(RR), where we treat RR as a set of numbers, one number for each strategy. The (i,j) which identifies the best strategy is that pair for which

max(RR) = RRi,j.

--Zeracles 19:46, 3 December 2007 (CET)

How to get formulae

I've updated the latex version.

Given an n-dimensional strategy space, we have an n-dimensional surface formed from individual RR values specific to each strategy. We can do calculus with this surface (at least, I hope, I haven't thought about the details yet). For this, we are interested in the derivative of RR with respect to S, dRR/dS - S being vector-valued makes this is a vector calculus problem. We are motivated by the relation above, max(RR) = RRi,j, which describes the maximum value taken by our surface, and many of us will know that calculus is often a good tool for finding maxima. The first derivative

dRR/dS = 0

will give us local maxima and minima. A condition on the second derivative

d2RR/dS2 < 0

tells us which of these are maxima. Both of these conditions must be satisfied for local maxima in our surface. We want the maximum of these for max(RR) (which is also the maximum of the maxima and minima, so depending on what's fastest, we may ignore the second condition). This is where this becomes a conditional formula, as we have long suspected - there will be certain conditions under which some maxima are higher than others. --Zeracles 20:48, 12 December 2007 (CET)

The RR Surface

I've updated the latex version (which is superior because it contains figures).

The first step is to describe the RR surface with a formula. As I think I mentioned somewhere on the little web page I made for this, we already have this for the first of the figures here. For just the smooth part, everything but the pixel in the top left hand, this is

RR = mrnm - lcnl.

But we need to have RR in terms of the strategy space variables (S).

RR = mrnm(S) - lcnl(S)

For this part of the RR surface, all the minerals are collected from the surface, so this is constant. On the other hand, nl has a surface of its own in strategy space, which one can see in the latex version. And I'll be back when I know what this surface is in terms of S (simple, probably, but I don't know yet). --Zeracles 17:31, 10 January 2008 (CET)

In terms of S

Here's a chain of reasoning which puts n_l in terms of c_b and c_p.

  • number of landings = total minerals on the surface / amount collected with each landing
  • amount collected with each landing = (1 + number of single node mineral trips) * amount at each deposit (plus 1 at the beginning because the lander picks one up instantly, for the moment)
  • number of node trips = min(number of node trips until c_b forces bailout, number of node trips for c_p to force bailout, number of node trips until lander capacity is reached)
  • number of node trips until c_b forces bailout = (12 - c_b) / damage for each node trip
  • number of node trips until c_p forces bailout = ceil((12 - c_p) / damage for each node trip)
  • number of node trips until lander capacity is reached = (lander capacity - amount at each deposit)/ amount at each deposit

In the second last item, ceil just rounds up to the nearest integer, so ceil(1.5)=2. The difference between the last two items comes from what c_b and c_p are, exactly. c_b can force bailout in the middle of a node trip, while c_p only does it immediately after the lander picks something up. As an example, if 2 crew are lost on each node trip and c_p = 9, the number of node trips until c_p forces bailout is not 1.5 but 2.

All this means that we can write an equation for the RR surface in terms of S. I'll test it first. --Zeracles 18:27, 24 March 2008 (CET)

Don't know how to display an equation like this in wiki, but you can see it at the end of this. To put it in context, we have an extremely rough expression for the worth of exploiting a planet, for each possible strategy. One could take the maximum value of RR as a measure of what a planet is worth, and the strategy which allows this as the optimal strategy. Better would be a formula which spits both of these out. --Zeracles 22:22, 20 April 2008 (CEST)