Announcement

Collapse
No announcement yet.

VRX scoring

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Michael
    replied
    Hi Coline,

    We met to discuss these questions yesterday and review our scoring policies in general. You can expect a new, clarified version of the documentation to come out soon. In some cases, we are also going to make some alterations to the scoring to make sure we are correctly incentivizing the right behavior, and sometimes to eliminate loopholes (some of which you've drawn attention to). Thanks for all your help and careful review of the guidelines.

    I think the new version of the documentation will provide the answers I owe you in response to you questions from "EDIT2" above. If anything is still unclear though please let us know!

    Best,
    Michael

    Leave a comment:


  • Coline
    replied
    Hi Michael,
    Yes we agree that run_time is equivalent to the opposite of what was posted (and what the plugin outputs), since run_time= total_time_allowed-(remaining_time-penalties). When looking at runtime smaller is better, when looking at remaining time higher is better. That's what got us confused initially since we were expecting our score to be smaller than NIROM for world 3.
    Thanks for all the clarifications!
    Coline

    Leave a comment:


  • Michael
    replied
    Hi Coline,

    I believe that sentence on page 14 applies just to tasks 5 and 6. We should probably add some clarification to the document about ties in the navigation task. I think we intended to break ties using total runtime in that task also, but I will confirm.

    Regarding the navigation task score, there are three values we're tracking:
    • run time: the total time it takes to make the run + penalties
    • run score: the rank order of the participants in a particular run, based on run time
    • total task score: sum of all run scores
    It seems like this could also use some clarification since the value we posted was, as you noticed, not one of the 3 values listed above, but rather an intermediate calculation that was convenient for calculating the run score. However, the official scoring will still work as stated. If you think our scoring is not equivalent to the above then please let me know--we will definitely want to fix it.

    Michael

    Leave a comment:


  • Coline
    replied
    Hi Michael,

    Yes I was looking at the task descriptions document v1.3, bottom of page 12, top of page 13. Yes I meant ties in the total score, but I just saw that it was answered on page 14 ("Any ties are broken based on the total elapsed time for all simulation runs"). Sorry about that.

    Coline

    Leave a comment:


  • Michael
    replied
    Hi Coline,

    Not to dwell too long on this, but are we both looking at the "Scoring" section at the bottom of page 12 in the VRX task descriptions document v1.3? We did contemplate a few versions of scoring so I just want to make sure we don't have any inconsistencies in the documentation.

    I'll need to confirm with the rest of the team before giving you a definitive answer to your other two questions. When you are asking about ties, do you mean ties in the total task score? I'm assuming you're thinking that if the total score is a tie we could break it by looking at the individual run scores across all trials, but please let me know if you meant, for example, ties on a single run.

    Michael

    Leave a comment:


  • Coline
    replied
    Thanks Michael,
    Yes, we figured out that the plugin works a bit differently than specified in the rules. The score is the remaining time rather than time elapsed. A couple more questions were added after the fact as an edit:
    1. If several teams fail to complete the task do all these teams get 12 points or do they get (rank of the last team that completed the challenge)+1?
    2. How will ties be broken?
    Thanks again,

    Coline

    Leave a comment:


  • Michael
    replied
    Coline,

    These questions are great, and much appreciated. Did you answer your own question in this last post, or is there still some confusion?

    Best,
    Michael

    Leave a comment:


  • Coline
    replied
    Hi,

    Sorry for asking so many questions. We have new questions about scoring for Task4 (navigation challenge). From the rules we understand that the score is the runtime elapsed between when we pass the first gate to when we pass the second gate. We looked at the video generated from the log in World3 and it looks like we are going through the gate in roughly 30 seconds without hitting any buoys, we then looked at NIROM's video from the same trial and it looks like their boat takes about 1 minute to go through. Our final score is 169 and their final score is 110, so we were wondering how the scores were computed.
    EDIT: After looking at the plugin we realized that the score is the remaining time -the number of collision * 10
    EDIT2: If several teams fail to complete the task do all these teams get 12 points or do they get (rank of the last team that completed the challenge)+1. How will ties be broken?

    Thanks for all the help,

    Coline
    Last edited by Coline; 11-06-2019, 03:09 PM.

    Leave a comment:


  • Coline
    replied
    Hi Michael,

    I just tested video generation and it worked fine. Thanks for fixing the issue.

    Coline

    Leave a comment:


  • Michael
    replied
    Hi Coline,

    That's very clever. Yes, we will certainly change it. Thanks for letting us know. I'll take a look now and create an issue.

    By the way, I saw you were active on the issue regarding video generation in vrx-docker. Did this get resolved?

    Best,
    Michael

    Leave a comment:


  • Coline
    replied
    Thanks Brian,
    For Task 5 (dock), will the scoring plugin be modified? I think right now it uses the same plugin as for scan and dock, and you get 10 points for reporting the default color code (which is how we scored points for that task in dress rehearsal).
    Thanks again,
    Coline

    Leave a comment:


  • brian.bingham
    replied
    Coline and FT Team,

    This is an excellent point about total vs. mean error when the number of identifications is variable. The latest Task Description document is ambiguous to on the details of this as it says, "The localization score is determined by rank ordering teams based on position error between their estimated 2D position and the true, simulated 2D position of the object."

    Let me discuss briefly with the Technical Team and provide some clarification.

    Thank you,
    Brian

    Leave a comment:


  • Coline
    replied
    Thanks Brian and the VRX Team,

    We agree that if the ready duration is long enough to get into position before scoring begins then the scoring is fair for stationkeeping. Summing the errors is a good scoring method as long as the scores can be expected to be roughly of the same order of magnitude across runs.
    For the perception task scoring, we feel like there is a pretty big difference between total error and mean error. If the total error is used it disadvantages teams that correctly identify more buoys since their total error will be higher compared to a team that can localize buoys with the same accuracy but identify fewer buoys.
    Thanks again,

    The Georgia Tech team

    Leave a comment:


  • brian.bingham
    replied
    Thank you for bringing up this issue for clarification.

    For the Stationkeeping task our intention is to have the goal pose in a location that can be reached by the time the simulation transitions to the ‘Running’ state and score (error begins to accumulate). In other words, the duration of the ‘Ready’ state should be long enough to allow the vehicle to navigate to the waypoint. This can be accomplished by setting the goal pose in a location that is reachable in the time specified in the `ready_state_duration` scoring plugin parameter. Currently we have been using a default value of 10 seconds for that value.

    Based on your recommendation, for Phase 3 we will make sure that the goal pose values and the Ready state duration are set such that the WAM-V can navigate to the goal pose before scoring begins in the `Running` state.

    We understand you concerns about one poor run and your proposal of rank ordering performance for each trial of the task. However, one poor run would also result in a high ranking for that trial which would adversely affect the team. We believe that the solution above would help keep each trial independent, so that one poor trial wouldn’t dominate a team’s task performance. Furthermore, unless three is a serious error in the scoring system, we don’t feel we can change the scoring system this late in the competition. The current scoring system is intended to incentivize performance that is robust with respect to environmental and task parameters. Hopefully making sure that the goal pose is reachable in the ready state will alleviate most of the concern.

    With regards to your question about Perception task scoring, here is how we will score the task:
    1. Count the total number of correctly identified objects across all trials. Rank order the teams from most correctly identified to least.
    2. Sum the total position error (equivalently the average) for all correctly identified buoys across all trials. Rank order the teams from lowest to highest total/mean error.
    3. Sum the numerical rankings from 1) and 2), then rank the sum from lowest to highest. Ties are broken by lowest total/mean error value.
    We will try to clarify this by showing the system in the Phase 2 scoring.

    Thank you for the great comments and questions. We are sure this will help clarify for other teams.

    Please let us know if we can clarify further.

    The VRX Technical Team

    Leave a comment:


  • brian.bingham
    replied
    Coline and Georgia Tech Team,

    Most of the VRX Technical Team is out the rest of this week - ROSCon and planned holiday - but we'll go over this as a group and clarify the scoring.

    Thank you for the comments!

    Brian


    Leave a comment:

Working...
X