Fishwerks Solver

User Guide for NEOS Fishwerks Server

Author/Programmer: Austin Milt

Last Updated: October 23, 2017

Intro

The NEOS Fishwerks Server is a service that gives access to the research version of the Fishwerks decision support optimization model. You might use this service to run optimizations for networks outside the Great Lakes Basin, alternative barrier removal cost estimates, additional target species, and others. The purpose of this document is to familiarize you with the research model, required inputs, and outputs enough that you will be able to fully utilize the flexibility of the model.

Getting Started

If this is your first attempt at a barrier decision optimization, I highly recommend starting by playing around in the online Fishwerks decision support tool (at the time of writing, it can be found at https://greatlakesconnectivity.org/). This will familiarize you with the basic inputs and outputs of the model without forcing you to do much prep work. Once you feel you understand what the online decision support tool is capable of, how to use its many features, and that it does not fulfill your needs, only then should you turn to the NEOS FIshwerks Server.

If you are already familiar with the online Fishwerks decision support tool and know that you need a custom model, but have not used the NEOS Fishwerks Server before, start by reading through this entire document, in order, along with inspecting the provided example datasets. Then, run at least two of the example datasets through the service after tweaking some of the data in the input CSVs.

If you have used the NEOS Fishwerks Server before and just need a refresher, I'd recommend inspecting a couple of the example datasets and running at least one tweaked example through the server before creating your own custom datasets.

GAMS Research Model

The current version of the research model is meant to accept many variations on possible types of models. To accomplish this, the naming conventions and dimensions of model parts are probably beyond your usual needs and as a result some additional work will have to be done up front to format your data correctly.

The research model performs a linear integer programming optimization, where the objective is to maximize a combined benefit over multiple beneficiary and/or control targets by doing projects at or above barriers with a limited budget(s) and constraints on target habitat. Barriers included in the optimization may be candidates for one or more projects (e.g. removal, lampricide), or may just be in the same network as candidate barriers. Projects may affect either the passability of a barrier or the amount of potential benefit above a barrier for a target.

GAMS has its own format for interpreting input data and linking that to the optimization model. Your job is to take your own data and convert it into two CSVs (see Data Files section) containing all the data and problem specifications for the particular problem(s) you want GAMS to solve. These CSVs will be run through a script that converts them into GAMS's special format for loading into the model. In particular, your input CSVs will define a series of sets and parameters, which are the two major types of data GAMS uses in optimizations.

The schematic below shows how each of the parameters and/or sets specified in the inputs you upload to NEOS fit together in the model. Names on the first line in each rounded box are the parameter/set name and on the second line in italics are the sets over which the parameter/set is indexed in the order that they are indexed. These indices need to be reflected in your data definitions. Arrows show the direction data flow, e.g. from one set to another if one is a super set of the other or from data into the model.

More information on each parameter/set is given in the glossary after the schematic.

diagram

Glossary

Barriers
dams and road stream crossing culverts at/above which projects are completed; barriers are always organized in a branching network with each barrier having one downstream barrier and any number of upstream barriers
benefitMaxBase
equivalent of passBase for the baseline potential benefit above the Barriers for each of the Targets; used to calculate the total potential benefit above a barrier given any ProjectsBenefit completed at/above a barrier; at the time of writing, this would be e.g. the HAB_UP field, i.e. length of river segment above a barrier, in the Fishwerks database
benefitMaxChange
change in potential benefit (benefitMaxBase) for each of the Targets above each of the Barriers when each of the ProjectsBenefit are completed; at the time of writing, this would be e.g. quality-weighted habitat adjustments for sea lamprey after lampricide application
BudgetNames
names of budgets from which projects may draw funding; projects may draw money from only one budget (ProjectToBudget), so you should have no more budgets than projects
budget
designated amount of money available to complete each of the budgets (BudgetNames); these are floating points indexed by BudgetNames
cap
for Targets with a negative weight, this is a maximum allowed accessibility-weighted benefit; for Targets with a positive weight, this is a minimum allowed accessibility-weighted benefit
cost
cost of completing each of the Projects at/above each of the Barriers; used to enforce budget constraints
Downstream
next downstream barrier from each of the Barriers; first index of this set refers to the source barrier; second index of this set refers to the downstream barrier; at the time of writing, convention for designating a root (isRoot) barrier as the downstream barrier is '-1'
Guilds
names of passability guilds to which targets belong; used in calculations of cumulative passability and habitat access
isCandidate
binary (0/1) designation that a barrier (Barriers) is a candidate for a project (Projects); barriers can be candidates for any number of Projects
isRoot
binary (0/1) designation that a barrier (Barriers) is a root barrier, i.e. has no Downstream barriers; this must be specified in addition to the convention described in Downstream; root barriers get special treatment in the model since the calculation of cumulative passability for other barriers relies on the presence of one or more Downstream barriers
obj2Weight
weight of the secondary objective in the objective function, which maximizes the weighted sum of accessibility-weighted benefits for Targets in GuildsBeneficiary; the weight of the objective for Targets in GuildsBeneficiary is always implied to be 1; this is multiplied by the target-specific weights (weight); this should generally be positive and small while the weight for each of the Targets in GuildsControl should be negative
parameter
one kind of data in a GAMS model; these are always treated as numbers, but could be binary; parameters may have anywhere from zero (i.e. a single value) to any number of dimensions; at the time of writing, the parameters with the largest number of dimensions are benefitMaxChange and passChange, each of which have 3 dimensions
PassabilityTier
relative swimming ability of fish in each of the Guilds; values are integers, and should be separated by one (at most), with the lowest value representing the weakest swimmers and the highest value representing the strongest swimmers; used to facilitate calculations of cumulative passability for undesirable species during optimization
passBase
barrier-specific passability for each of the Targets at each of the Barriers; used to calculate the cumulative passability of a barrier given its Downstream barriers and any ProjectsPassability completed at those barriers; at the time of writing, this would be e.g. the Strong, Moderate, or Weak swimming passability scores in the Fishwerks database
passChange
change in barrier-specific passability (passBase) for each of the Targets at each of the Barriers when each of the ProjectsPassability are completed; at the time of writing, this would be e.g. the default assumed change in passability of a barrier after remove (i.e. 1-passBase)
Projects
projects that can be complete at/above a barrier to affect either the passability (passChange) of a barrier or the potential benefit (benefitMaxChange) above a barrier; projects are broken into two types: ProjectsPassability which affect barrier passability and ProjectsBenefit which affect the potential benefit above a barrier
ProjectsBenefit
one of two types of Projects; those projects that affect the total potential benefit above a barrier; these Projects modify the baseline benefit values (benefitMaxChange) at barriers
ProjectsPassability
one of two types of Projects; those projects that affect the barrier-specific passability for each of the Targets at each of the Barriers; these Projects modify the baseline passability (passBase) at barriers
ProjectToBudget
designated budget name (BudgetNames) from which each of the Projects can draw funding; first index is the name of the project; second index is the name of the budget to which it belongs
set
one kind of data in a GAMS model; all sets are collections of names with a common theme, e.g. Barriers, Projects, Targets, etc.; these are always treated as text and never numbers
scalar
one kind of data in a GAMS model; this is just another name for a zero-dimensional parameter
Targets
target metrics/objects of the optimization; generally these are species, but could be any conservation target
TargetToGuild
mapping from a target (Targets) to its passability guild (GuildsByDesirability)
weight
relative weight of a target in the objective function; sign of the weight affects how a target is processed in the model: positive-weight Targets enter the objective as the value specified and their cap is interpreted as a lower bound, while negative-weight Targets have habitat first multiplied by obj2Weight as part of the secondary objective and their cap is interpreted as an upper bound

Data Files

For a related series of problems that you wish to pass to the NEOS Fishwerks Server, you will need to upload two CSVs:

  • table.csv is an Excel-style CSV containing barrier data in a long table format, e.g. what you would find in the 'barriers' dataset in the Fishwerks Hydrography database.
  • definitions.csv is an Excel-style CSV containing other optimization data, e.g. budgets, as well as a mapping from the column names in table.csv to the model symbols in the research model.

Below I describe how each of these should be formatted and general variations/options that can be used. I recommend having open the example CSVs as read. These have additional information about each input and many defaults to help clarify how they should be specified.

table.csv

This Excel-style CSV contains barrier optimization data as a long table. The basic syntax is simple: each row represents a unique barrier in your barrier network, and columns are optimization data associated with those barriers. If you are familiar with the Fishwerks hydrography database, then this file should already be apparent; it is essentially the 'barriers' dataset with some additional/optional fields. Here are some basic rules about the syntax in table.csv:

  • Generally columns can have any name, but should be unique across the columns to avoid having some data ignored.
  • Each row represents a single barrier and its associated optimization data.
  • Regardless of the habitat optimization model being run, you will have to have a column that specifies a unique ID for each barrier. Most other data could be optional depending on the model being run.
  • Each barrier must have an associated downstream barrier ID to indicate the next barrier downstream from the one identified on the current row. If a barrier is the most downstream barrier in your dataset, you must use '-1' in the downstream column.
  • In addition to the convention of using '-1' to indicate barrier is a root (has nothing downstream), you must have a column that does this same thing in binary (0/1). A simple Excel formula achieves this, i.e. '=if(<downstream ID column>=-1, 1, 0)'.
  • One or more binary columns designate which barriers are candidates for projects, e.g. removal and/or lampricide. Each project should have its own column.

Other required columns will depend on the optimization problems you wish to solve. In general, across table.csv and definitions.csv you should expect to specify data that correspond to each of the 'Input Data' listed and described in the schematic in GAMS Research Model. Specifically in table.csv, you should expect to have columns for the cost of doing a project, baseline passability, passability change for each guild from doing a project, baseline benefit, and benefit change for each target from doing a project.

definitions.csv

This Excel-style CSV contains other optimization data not indexed by barriers defined in table.csv. The specific information needed in this file depends on the GAMS model being run. Generally, any data not already contained in table.csv or the GAMS model file directly must be specified in definitions.csv. Here are the basic rules of syntax for this file:

  • Each row represents a unique combination of (a) model run and (b) set/parameter defined over a single dimension of that set/parameter. For more on these, keep reading and refer to the example dataset.
  • Four columns are required:
    1. Run: This designates the model run for which the symbol in the current row is defined. You must specify at least one run; each run is used to produce a different input file for the GAMS model, and the GAMS model operates on one run at a time. More on this in Run column.
    2. Symbol: This designates the set/parameter for which you are currently specifying model run values. In addition, it specifies which indices are being defined in the following column. More in Symbol column.
    3. Values is Column Name: This designates whether the Values column (next) is to be used directly or if it identifies a column in table.csv. More in Values is Column Name column.
    4. Values: This designates the values of the set/parameter being defined in the current row. Values may be either be directly interpreted in the optimization model or names of columns in table.csv that will be converted to data for the optimization. More in Values column.
  • Any other columns are not used by the service. You will find descriptions for each symbol used in the current GAMS habitat optimization model in a 'Notes' column in the example definitions.csv included.
  • You should omit/delete unnecessary values. For instance, if your series of problems does not include lamprey, you should have no references to lamprey in definitions.csv. This would entail deleting references to lamprey in cap, Targets, TargetToGuild, benefitMaxBase, and benefitMaxChange. If a row is defining that unnecessary value alone, delete the whole row. For instance, in the above example, you would delete the rows pertaining to every set/parameter except Targets.

Run column

The 'Run' column is used to differentiate between otherwise identical symbol definitions spanning multiple model runs. A separate GAMS input file is produced for each model run. Syntax and options for this column are:

  • Values must be integers with the exception of optionally defining a default set of set/parameter values to be used for sets/parameters not explicitly defined for a run. To specify a default value, leave the Run column blank.
  • Values do not have to be sequential, e.g. your runs can be 1, 3, 999, 8, 4.
  • Sets/Parameters do not need to be redefined for every run. By default, the NEOS server will take values not defined for a run from the default run, i.e. defined by a blank value in the Run column. If no defaults are specified, the server assumes the run with the smallest value contains defaults. And if no default value exists for a required symbol, the server will return an error. Thus, symbols required by the GAMS model must be defined either with a default or in the smallest-value run, but can be changed across runs by specifying the run number. See the example definitions.csv for examples.
  • You can specify a range of runs using a tilde, e.g. '5~23' would set the values for a symbol for runs 5, 6, 7, ..., 23. This can be used to either have a single value across the range, or combined with the Values column (see Values section below) to specify a range of values across the runs. Note this should not be used to specify several different values across runs with the exception of using the value range option (see Values section below). Be careful if editing definitions.csv in Excel. You must set the Run column's format to be 'Text', or Excel will automatically convert hyphenated values to dates.

Symbol column

The 'Symbol' column specifies the set/parameter being defined on the current row as well as the specific indices being defined in the 'Values' column. The basic syntax is as follows:

<symbol name>(<first index>,<second index>,...)

For example 'cost(Barriers,removal)' would tell the server that this row defines barrier removal costs over the set of barriers in the network. More on syntax:

  • The <symbol name> must match a set/parameter name in the GAMS model. Since these may change across version of the GAMS model, I will not list them here. However, you can see the example definitions.csv and the GAMS habitat optimization model user guide for specific symbol names and what they mean.
  • Likewise, each index must be either the name of another symbol or of a specific index for which this row is being defined. In the above example, 'Barriers' is the set of barriers and matches the set name in the GAMS model, while 'removal' is one of the Projects (another GAMS model set) that can be executed at a barrier. You will find that both of these are (necessarily) defined in other rows of definitions.csv.
  • When a symbol has no indices to define, simply leave off the parentheses.
  • If two symbol rows share the same run number, they must be either a) different symbols, or b) the same symbol with different indices. Otherwise, the script will fail and indicate where an issue occurred.
  • Symbols do not need to be redefined for every run. By default the server will take symbol values not defined for a run from the default run, i.e. defined by a blank value in the Run column. If no defaults are specified, the server assumes the run with the smallest value contains defaults. And if no default value exists for a required symbol, the script will fail. Thus, symbols required by the GAMS model must be defined either with a default or in the smallest-value run, but can be changed across runs by specifying the run number. See the example definitions.csv for examples.

Values is Column Name column

The 'Values is Column Name' column is a binary/Boolean column specifying whether the values in the 'Values' column comes from table.csv and thus indicates a column name, or is a (series of) value(s) defined in defintions.csv. To indicate the former (a column name), specify 'y'. To indicate the latter (a series of values) specify 'n'. See the next section for more information.

Values column

The 'Values' column specifies either (a) a value for the current set/parameter and its indices being defined on the current row or (b) the name of a column in table.csv corresponding to the values for the symbol on the current row.

How do you know which is needed?

Simple: if the values are indexed over Barriers, you are specifying a column name in table.csv. If the values are defining a symbol over other indices, e.g. Projects, you are specifying the value in the definitions file. Syntax rules are as follows:

  • Anything is allowed. The server will automatically interpret what kind of data you are specifying. To accomplish this, the server reads the GAMS model file and determines the data type based on that. More specifically, anything that is a Set in the GAMS model will be kept as a string, while anything that is Parameter or Scalar will be converted to a floating point value (though this may happen after values are loaded from table.csv). Although, see the special case of specifying a range below.
  • Separate multiple values with a comma and, preferably, no spaces, e.g. 'Fish1,Fish2,...'
  • Numbers should be specified according to Python conventions. The only place this might deviate from expected is when you need to specify positive or negative infinity, in which case you should write either 'inf' or '-inf'.
  • You can specify a range of values by specifying a starting value, ending value (inclusive), and number of values in the range, with each part separated by a colon. For example, to specify 5, 5.5, 6, 6.5, 7, 7.5, 8, you would write '5:8:7'. Note: this must be combined with a run range (see Run section above). As such, this should not be used to specify a range of values within a single run.

Parsing Outputs

For any pair of table.csv and definitions.csv, the NEOS Fishwerks Server will a single CSV containing all the results of all problems specified. In addition, the server will print out all normal execution printouts as well as errors.

The results CSV returned by the service is an aggregation of the results from one or more runs defined in definitions.csv. The results CSV has a header row and is followed by results data. Each row reports the value of a results set/parameter at a single index.
At the time of writing, the header row will have the following column headers:

  1. file: each problem defined in definitions.csv has a unique name in results.csv, and should match with the 'Run' (see definitions.csv). Every row with a particular 'file' will be reporting results for that problem
  2. symbol: name of the set/parameter whose value at a particular index is being reported. See below for the set of symbols reported in the results CSV at the time of writing.
  3. d0: index value (if applicable) of the first dimension of the current results value being reported.
  4. d1: index value (if applicable) of the second dimension of the current results value being reported.
  5. value: value (if applicable) at the current index of the current symbol being reported.

At the time of writing, the following sets/parameters are reported in the results CSV:

  • actions: identities of barriers and projects at which projects should be performed in the optimal solution to the problem. Each row will look like <problem name>,actions,<barrier ID>,<project name>,<blank>
  • budget: budget value for each budget in the problem. Each row will look like <problem name>,budget,<budget name>,<blank>,<budget value>
  • budget_spending: spending on each budget from all projects assigned to that budget in the problem. Each row will look like <problem name>,budget_spending,<budget name>,<blank>,<spending on that budget>
  • cap: upper or lower bound on habitat access for a target. Each row will look like <problem name>,cap,<target name>,<blank>,<cap value>
  • objective: value of the objective function evaluated at the best achieved solution. Each row will look like <problem name>,objective,<blank>,<blank>,<objective value>
  • optimality_gap: proportional gap (between 0 and 1) between the best achieved objective value and the solver's estimate of the best possible objective value, which may or may not be achievable. NaNs or other non-number values typically indicate the objective value is confirmed to be the global optimum. Each row will look like <problem name>,optimality_gap,<blank>,<blank>,<optimality gap value>
  • project_spending: spending on each project at all barriers where the project is to be done in the problem. Each row will look like <problem name>,project_spending,<project name>,<blank>,<spending on that project>
  • solve_time: time taken from the start of the GAMS model until the writing of the output, in seconds. Each row will look like <problem name>,solve_time,<blank>,<blank>,<solve time>
  • target_benefits: total benefits accrued to each target from all projects done in the best achieved solution to the problem. Values are not weighted by target weights. Each row will look like <problem name>,target_benefits,<target name>,<blank>,<total benefit for that target>
  • weight: weight assigned to each target. Each row will look like <problem name>,weight,<target name>,<blank>,<target weight>

Be aware, GAMS does not generally output zeroes. Thus, if in parsing your outputs you find that some records are inexplicably missing and no error was raised by the service, it is highly likely that the value for the particular parameter of question was simply zero. You should always confirm that this is possible, and build your post-processing pipeline to accommodate missing zeroes.

Errors

The NEOS Fishwerks Server will print out any errors encountered during execution. You should monitor the submission page for such errors. If you encounter an error that is not easy to understand or fix, you should contact the author for help. Rather than attempt to exhaustively list all possible errors, here are some common mistakes that will lead to failures of the service:

  • One or more required sets/parameters was not defined, or was only partially defined in defintions.csv.
  • A parameter was not given a default value and one or more runs do not have a defined value for that parameter.
  • A unique combination of run, set/parameter, and indices was defined multiple times in defintions.csv.
  • There is a mismatch between a column name given in definitions.csv and table.csv.
  • One or more cells in table.csv has a blank or otherwise incompatible value for the set/parameter defined in definitions.csv.
  • One or more 'Run' ranges defined with a hyphen in definitions.csv was converted to a date by Excel.
  • A series of values was defined on a row using the colon syntax for a single run in definitions.csv.
  • One or more blank rows were saved in one of the input CSVs. In a text editor, check for rows with only commas.
  • A set/parameter index was not omitted even though it was not meant to be part of the problem specification.
  • A non-integer value was assigned in the 'Run' column in definitions.csv.
  • The total passability achievable at a barrier from doing all projects is <0 or >1.
  • The total achievable benefit at a barrier from doing all projects is <0.

Example Data Files

The following downloadable zip file contains samples that illustrate the concepts described in this document, providing input files that can be submitted to the Fishwerks Solver to practice using this service.

Included materials:

  • Sample 1: A template CSV that employs most of the capabilities of the Fishwerks research model.
    • Four targets, three of which are beneficiary targets for each of the passability guilds, and lamprey.
    • Defines a range of budgets.
    • Defines a range of caps on lamprey.
    • There is a single budget category from which lampricide and removals can draw money.
  • Sample 2: A more traditional Fishwerks series of problems.
    • One, strong-swimmer target.
    • Barrier removal only.
    • Single budget category.
    • Several removal budgets.
  • Sample 3: Illustrates that defined columns in table.csv can be used to vary barrier attributes across runs by specifying in definitions.csv.
    • Identical to definitions_2.csv, except that instead of varying budgets, the cost of removing barriers varies over two runs.
  • Sample 4: Illustrates that the NEOS Fishwerks Server can be used to extend the research model beyond the concepts usually exploited in Fishwerks.
    • Three projects, each of which has its own budget to draw money from. Projects are barrier removal, lampricide application, and restoration of riparian buffers around streams to increase habitat for the strong swimmer. It acts similarly to lampricide, except that it increases habitat instead of decreasing it.
    • Lamprey receive equivalent (but negative) weight to the strong swimmer.
    • Only removal budget changes over runs. Others remain static.