Fiehn Lab - Frequently Asked Questions (FAQ)

Please see below several questions from participants, answers are provided in blue.

Is it possible to only participate in one of the categories (e.g. structure prediction?)
- Yes, you can participate in one category of your choice.
How will the tasks be evaluated? For example, the structure prediction task looks like there is a single InChIKey or SMILES. Will that task is evaluated based on similarity to the actual, or exact match? How will those be measured?
- We plan to evaluate matches by the first block of the InChIKey. It should be an exact match in the first block. If SMILES are provided then we will use that to match, but our preference is InChIKey. They'll be measured by match or no match.
“Compounds are not included in public libraries” [is stated] on the website. [Do] public libraries refer to the molecular database or spectral database?
- They refer to spectral databases or libraries.
While MS/MS spectra of Compound numbers 81, 282, 432, and 476 are missing in the msp file, is it available?
- Those challenges have been fixed but will not count in the evaluation. Please see the main page for correct information. Excel sheets with corrected information are shared in the Google drive folders 1 & 2 here.
In previous CASMI competitions, a candidate list was provided, and certain metrics (topk, rank, etc.) were computed. I suppose this year that no candidate list is provided, and only the top-ranked compound is relevant?
- Yes, there is no candidate list. We are only looking for your best-ranked compound or answer in this year's contest.
Is it possible to participate in the challenge with external software and still submit the results?
- Yes, you can participate with external software. The idea behind CASMI is to get an evaluation and feedback on the status of compound ID. So, if it's a workflow you have with external software or even your own software you've developed - this is the space to test those different approaches.
It seems that some data are missing. From what I can download there are only 21 folders in positive mode. Is this really all for approx. 120 or so compounds?
- The data is not missing, what you are referring to is the Bonus challenges in neg mode, which has 21 files. If you refer to the bonus challenge sheet which contains the file to be used, RT (min), precursor m/z, and category then sort by neg only it shows that there are 77 compounds, but sometimes there are multiple compounds in a single file.
Previous rounds specified which compounds are natural and which are synthetic. I know you try to mimic reality, but the reality is that we do know where the sample is coming from. Labelling which compound is synthetic and which is natural would be realistic and fair.
- At this stage of the CASMI - we don't want to change the rules or information that is provided thus far unless there is an error that impacts results like incorrect precursor m/z.
Based on the submission template it appears that this time you are going only for a single answer submission. Is that correct?
- Yes, we are going for single-answer submissions.
What do you mean by compound class in the submission template? For example, what do you put for delaviridine?
- Compound class refers to the use of chemical classification, many use ChEBI, ClassyFire, ClassyFire Batch, NPClassifier, etc. For the example you provided, I would say that the class is Diazinanes, but more specifically:
  - Kingdom: Organic compounds
  - Superclass: Organoheterocyclic compounds
  - Class: Diazinanes
  - Subclass: Piperazines
  - Parent Level 1: Pyridinylpiperazines
Are all compounds known? I figured few if any spectra are in the databases, but are the compounds themselves known?
- In the priority set, compounds are not included in public libraries.
From "we here provide raw LC-accurate mass MS/MS data in both +ESI and - ESI mode, with a list of 500 compounds and their retention times and accurate m/z values » -> It isn’t clear whether *each* sample/compound has both positive and negative ion mode data available? Or not (necessarily).
- A challenge is only found in either pos or neg, but not both.
"Compounds are not included in public libraries » -> Trying to make sure this is the exact meaning here: the *compounds* are not included in public fragmentation spectra libraries? Meaning there are no corresponding fragmentation spectra for the given compound in public fragmentation spectra libraries? Instead, the corresponding fragmentation spectra were not deposited in public fragmentation spectral libraries.
- Yes, they were neither deposited nor are there any corresponding fragmentation spectra for the given compound in public fragmentation spectra libraries.
Evaluation: the meaning of « Correct compound structure class » is a bit vague and might be problematic for the evaluation. First, there are many ontologies available so it will be quite challenging to compare results. Then the ontologies have different depths possible for a compound, plus there are cases where multiple classes are possible. For example, one case that comes to my mind is: if the annotation is correct at the superclass level but not at the class level, will the class results be considered as wrong? And inversely if only the superclass is provided (and correct) will the class challenge be considered, correct?
- This CASMI explores how the community approaches such challenges. Hence, we do not prescribe which classification tools scientists use. We will use our best judgment to make rational calls for "correct classifications", but by and large, we anticipate that we would use higher-order classifications (such as 'flavonoids' rather than 'flavonoid-O-glycosides"), but certainly more detailed than "organic oxygen compound.
The mass spec data have very limited MS2 coverage: was an inclusion list used? Some dynamic exclusion? Or some filters were used during the mzML file conversation?
- Yes, an inclusion list was used during the acquisition, dynamic exclusion was set to 0.8 s, and no filters were used during mzML file conversion.
Are multiple submissions (with different tools) possible per users/team?
- There is no limit on the number of submissions with different tools per user/team for example if you have 2 tools you can submit LF1 and LF2. However, for any given challenge we ask for single answer submissions.
On the website (https://fiehnlab.ucdavis.edu/casmi), it is stated that evaluation will be discussed based on 4 criteria, when opening the submission template, it looks like a 5th one appears as 'Monoisotopic Mass’. Is it then disregarded? Can we then leave it empty? I assume having the right monoisotopic mass when having the formula is trivial so just wanted to be sure this can be skipped.
- Yes, that's correct that column shouldn't have been there it's been removed to avoid any confusion.
What is the formalism to respect when reporting the adducts? Can those also include clusters or neutral losses?
- Yes, you can add anything you wish - we don't have any restrictions on what adduct a participant should use.
Why are the InChIKey and SMILES confounded? I am not very confident about this, especially for compounds absent from databases.
- They were selected because they are the most common compound identifiers.
What is the ‘chemical classification’ to use to report the compound class? Using ChEbi, Classyfire, NPClassifier to only cite a few will lead to very different results…I did not read any rules about it, so I was just wondering.
- This CASMI explores how the community approaches such challenges. Hence, we do not prescribe which classification tools scientists use. We will use our best judgment to make rational calls for "correct classifications", but by and large, we anticipate that we would use higher-order classifications (such as 'flavonoids' rather than 'flavonoid-O-glycosides"), but certainly more detailed than "organic oxygen compound.
Considering the number of challenges [is it possible to do a subset]?
- Yes, we welcome any amount that you can do within your schedule, even if it's 10. We just really want to know which approach people use today, compared to 5 years ago. If time permits, we still encourage your participation!
Submitters are allowed to participate in some or all of the 4 tasks, however, the former task results will affect the latter ones. In cases that the adduct is annotated incorrectly, will any following formula annotations be counted as incorrect? For example, an adduct of [M+H-H2O]+ is wrongly annotated as [M+H]+ in Task 1, and a formula annotation of C16H20O2 is provided (correct: C16H22O3) in Task 2. Will this formula annotation result be counted as, correct?
- Yes, participants are allowed to participate in some or all four tasks. If the adduct selection is incorrect it's most likely that the formula will also be incorrect. For your example, it will not be counted correctly. Even so, you don't have to do all 250 or 500 to participate. We welcome any amount that you can do within your schedule, even if it's 10. We just really want to know which approach people use today.

If you have additional questions please contact: Arpana Vaniya