kenfigure
Kenfigure™: a YAML spec to define Benchling configurations
Project maintained by kennovation1
Hosted on GitHub Pages — Theme by mattgraham
Benchling schema design style guide
This is an opinionated guide related to style aspects of schema design in Benchling.
You may choose to adhere to this style, or use it as a basis to create your own.
This list is not exhaustive of all possible topics and will likely grow over time.
Suggestions for additions or modifications are always very welcome.
This guide relates to the design of the schema itself and does not attempt to discuss
the style for the structure of the Kenfigure YAML files.
General
- Use common sense
- Be consistent within the model
- Be consistent with industry terminology and practices
- Avoid local shorthands and vernacular
- Don’t assume that names provided by scientists are canonical. Google to ensure you are using standard terms.
- Spelling, capitalization, spacing, and punctuation all matter
- Do not allow leading or trailing white space characters (note that Benchling permits this so be careful)
- Do not use consecutive whitespace characters, underscores, or dashes
- Names should be as short as possible without sacrificing clarity. Generally, names should be no greater than than 50 characters.
- Avoid use of embedded commas in names/options if at all possible. This is especially true for dropdown option names that might be used in multi-select fields.
- Names should not include unnecessary terms that are implied. For example, names should not include terms like “schema”, “record”, “item”, “results”, etc.
This includes terms that may be part of an acronym that is already in the name (unless it is common vernacular). E.g., Use “NGS” (or “Next Generation Sequencing”)
and not “NGS Sequencing” or “Next Generation Sequencing (NGS)”.
Object naming
- Schema and dropdown names should be Title Case
- Dropdown names should be plural nouns (e.g., Strains)
- Schema names (other than Result schemas) should be singular nous (e.g., Plasmid)
- Result schema names are often procedural or descriptive phrases since they do not represent objects like entity schemas do.
E.g., “Flow Cytometry”, “Body Weight”, “LNP Characterization”
Schema configuration
- Entity schemas should only enable a single naming option in most cases
- ID prefixes should not end with a character that could be confused with a digit. Namely, do not use ‘O’ (letter O), ‘l’ (lowercase L), or ‘I’ (capital I).
- ID prefixes should be as short as possible without sacrificing clarity. Generally, they should be no greater than 8 characters.
- Tooltips should be added to all fields
- Parent link fields for batch (lot) entity schemas should typically be required fields since they may be meaningless without their parent
- The name of a parent link (or entity link) should generally match the name of the entity to which it points.
For example, the parent link field on a Nanoparticle Batch schema should be called “Nanoparticle” if the parent entity schema is called Nanoparticle.
If schema names have prefixes (e.g., v2_) then those should not be part of the field name. For example, if the schema is called v2_Nanoparticle, the
field should be called “Nanoparticle”.
- If a dropdown is very large (e.g., >100 options), consider if a custom entity schema is more appropriate
- Sample fields in result tables should typically be required since results without an entity may be meaningless.
If there are multiple sample fields, then they should not be required since only 1 of N will be populated.
- Be careful about making result table fields required or not since once data is recorded you cannot add the restriction and there’s no way to migrate data from
one field to another
Field display names
- Field names should include units in the name where applicable
- Units should be wrapped in parentheses with a space before the open parenthesis. E.g., Mass (mg)
- Units should conform to standard spelling and capitalization. E.g., “mL” not “ml”.
- An exception is that molecular weight can be implied and omitted if units are Dalton or g/mol.
However, if units are kDa, they must be specified.
- Celsius should be represented as “(C)” with no degree symbol
- Field name capitalization should be consistent. Pick one and stick with it.
- Sentence case (Ken’s preference for readability). E.g., Lipid batch
- Title case (very common). E.g., Lipid Batch
- Field names should only use ASCII characters
- Do not use long dashes, curly single or double quotes, Greek letters, etc.
- Greek letters should be spelled out or ASCII versions should be used. Follow normal conventions for ASCII equivalents.
- E.g., “TNF-α” should be called “TNF alpha” (or less preferably TNF-a)
- E.g., “µm” should be written as “um”
- Names should be self-documenting, but should not be used as a crutch for training. Names should not be instructions.
- Name use should be consistent across the entire platform. For example, don’t use “Sequence length”, “Seq length”, and “Sequence len”.
Pick one and use it everywhere. Another example might be abbreviations like “conc.” v. “concentration”. Pick one and use it consistently.
- Conventions may be ignored when it impedes the easy import from instrument output and would required extensive manual field mapping.
In this case, it may be better to match the instrument output. Note that with the (new) ability to save mappings, matching instrument
field naming is less critical in some cases.
Field system names
- System names should contain a representation of the units that appear in the field name. If a ratio, the ‘/’ should be converted
to an underscore (‘_’) (note that Benchling will silently drop the slash when auto-generating system names).
E.g., “Concentration (mg/mL)” should have a system name “concentration_mg_ml”.
- System names representing percent units, should use a “_pct” suffix (note that Benchling silently drops the ‘%’ symbol when auto-generating system names).
E.g., The display name “Purity (%)” should have a system name “purity_pct”.