Announcement

Collapse
No announcement yet.

VRX scoring

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • VRX scoring

    Hi,

    We have some questions about scoring.
    1. Stationkeeping
    From the rules we understood that scores from each run would be summed and the team ranked in function of that summed score. However, for World3 where the pose is really far away the score will be much larger than for other worlds since traveling to the position takes some time. We think the variance of scores for that world will be much higher than for other worlds and is likely to erase all of the smaller differences, making it more of a race task.
    Doing a rank ordering per run would solve that issue (similarly to what is done for task 4), or normalizing the score in some other way.
    2. Wayfinding
    We think summing errors for that task has the same issue as for stationkeeping, where a single underperformance can erase good performance in every other runs.
    3. Perception
    We just want to make sure that we've properly understood the scoring for this task. The score is made by first averaging the total position error, second by summing the total number of correctly identified buoys, then by doing a rank ordering of both scores and combining (summing?) the rank orders. Is that correct?

    We would also like to hear what other teams think about the scoring.

    Thanks,

    The Georgia Tech team

  • #2
    Coline and Georgia Tech Team,

    Most of the VRX Technical Team is out the rest of this week - ROSCon and planned holiday - but we'll go over this as a group and clarify the scoring.

    Thank you for the comments!

    Brian


    Comment


    • #3
      Thank you for bringing up this issue for clarification.

      For the Stationkeeping task our intention is to have the goal pose in a location that can be reached by the time the simulation transitions to the ‘Running’ state and score (error begins to accumulate). In other words, the duration of the ‘Ready’ state should be long enough to allow the vehicle to navigate to the waypoint. This can be accomplished by setting the goal pose in a location that is reachable in the time specified in the `ready_state_duration` scoring plugin parameter. Currently we have been using a default value of 10 seconds for that value.

      Based on your recommendation, for Phase 3 we will make sure that the goal pose values and the Ready state duration are set such that the WAM-V can navigate to the goal pose before scoring begins in the `Running` state.

      We understand you concerns about one poor run and your proposal of rank ordering performance for each trial of the task. However, one poor run would also result in a high ranking for that trial which would adversely affect the team. We believe that the solution above would help keep each trial independent, so that one poor trial wouldn’t dominate a team’s task performance. Furthermore, unless three is a serious error in the scoring system, we don’t feel we can change the scoring system this late in the competition. The current scoring system is intended to incentivize performance that is robust with respect to environmental and task parameters. Hopefully making sure that the goal pose is reachable in the ready state will alleviate most of the concern.

      With regards to your question about Perception task scoring, here is how we will score the task:
      1. Count the total number of correctly identified objects across all trials. Rank order the teams from most correctly identified to least.
      2. Sum the total position error (equivalently the average) for all correctly identified buoys across all trials. Rank order the teams from lowest to highest total/mean error.
      3. Sum the numerical rankings from 1) and 2), then rank the sum from lowest to highest. Ties are broken by lowest total/mean error value.
      We will try to clarify this by showing the system in the Phase 2 scoring.

      Thank you for the great comments and questions. We are sure this will help clarify for other teams.

      Please let us know if we can clarify further.

      The VRX Technical Team

      Comment


      • #4
        Thanks Brian and the VRX Team,

        We agree that if the ready duration is long enough to get into position before scoring begins then the scoring is fair for stationkeeping. Summing the errors is a good scoring method as long as the scores can be expected to be roughly of the same order of magnitude across runs.
        For the perception task scoring, we feel like there is a pretty big difference between total error and mean error. If the total error is used it disadvantages teams that correctly identify more buoys since their total error will be higher compared to a team that can localize buoys with the same accuracy but identify fewer buoys.
        Thanks again,

        The Georgia Tech team

        Comment


        • #5
          Coline and FT Team,

          This is an excellent point about total vs. mean error when the number of identifications is variable. The latest Task Description document is ambiguous to on the details of this as it says, "The localization score is determined by rank ordering teams based on position error between their estimated 2D position and the true, simulated 2D position of the object."

          Let me discuss briefly with the Technical Team and provide some clarification.

          Thank you,
          Brian

          Comment


          • #6
            Thanks Brian,
            For Task 5 (dock), will the scoring plugin be modified? I think right now it uses the same plugin as for scan and dock, and you get 10 points for reporting the default color code (which is how we scored points for that task in dress rehearsal).
            Thanks again,
            Coline

            Comment


            • #7
              Hi Coline,

              That's very clever. Yes, we will certainly change it. Thanks for letting us know. I'll take a look now and create an issue.

              By the way, I saw you were active on the issue regarding video generation in vrx-docker. Did this get resolved?

              Best,
              Michael

              Comment


              • #8
                Hi Michael,

                I just tested video generation and it worked fine. Thanks for fixing the issue.

                Coline

                Comment


                • #9
                  Hi,

                  Sorry for asking so many questions. We have new questions about scoring for Task4 (navigation challenge). From the rules we understand that the score is the runtime elapsed between when we pass the first gate to when we pass the second gate. We looked at the video generated from the log in World3 and it looks like we are going through the gate in roughly 30 seconds without hitting any buoys, we then looked at NIROM's video from the same trial and it looks like their boat takes about 1 minute to go through. Our final score is 169 and their final score is 110, so we were wondering how the scores were computed.
                  EDIT: After looking at the plugin we realized that the score is the remaining time -the number of collision * 10
                  EDIT2: If several teams fail to complete the task do all these teams get 12 points or do they get (rank of the last team that completed the challenge)+1. How will ties be broken?

                  Thanks for all the help,

                  Coline
                  Last edited by Coline; 11-06-2019, 03:09 PM.

                  Comment


                  • #10
                    Coline,

                    These questions are great, and much appreciated. Did you answer your own question in this last post, or is there still some confusion?

                    Best,
                    Michael

                    Comment


                    • #11
                      Thanks Michael,
                      Yes, we figured out that the plugin works a bit differently than specified in the rules. The score is the remaining time rather than time elapsed. A couple more questions were added after the fact as an edit:
                      1. If several teams fail to complete the task do all these teams get 12 points or do they get (rank of the last team that completed the challenge)+1?
                      2. How will ties be broken?
                      Thanks again,

                      Coline

                      Comment


                      • #12
                        Hi Coline,

                        Not to dwell too long on this, but are we both looking at the "Scoring" section at the bottom of page 12 in the VRX task descriptions document v1.3? We did contemplate a few versions of scoring so I just want to make sure we don't have any inconsistencies in the documentation.

                        I'll need to confirm with the rest of the team before giving you a definitive answer to your other two questions. When you are asking about ties, do you mean ties in the total task score? I'm assuming you're thinking that if the total score is a tie we could break it by looking at the individual run scores across all trials, but please let me know if you meant, for example, ties on a single run.

                        Michael

                        Comment


                        • #13
                          Hi Michael,

                          Yes I was looking at the task descriptions document v1.3, bottom of page 12, top of page 13. Yes I meant ties in the total score, but I just saw that it was answered on page 14 ("Any ties are broken based on the total elapsed time for all simulation runs"). Sorry about that.

                          Coline

                          Comment


                          • #14
                            Hi Coline,

                            I believe that sentence on page 14 applies just to tasks 5 and 6. We should probably add some clarification to the document about ties in the navigation task. I think we intended to break ties using total runtime in that task also, but I will confirm.

                            Regarding the navigation task score, there are three values we're tracking:
                            • run time: the total time it takes to make the run + penalties
                            • run score: the rank order of the participants in a particular run, based on run time
                            • total task score: sum of all run scores
                            It seems like this could also use some clarification since the value we posted was, as you noticed, not one of the 3 values listed above, but rather an intermediate calculation that was convenient for calculating the run score. However, the official scoring will still work as stated. If you think our scoring is not equivalent to the above then please let me know--we will definitely want to fix it.

                            Michael

                            Comment


                            • #15
                              Hi Michael,
                              Yes we agree that run_time is equivalent to the opposite of what was posted (and what the plugin outputs), since run_time= total_time_allowed-(remaining_time-penalties). When looking at runtime smaller is better, when looking at remaining time higher is better. That's what got us confused initially since we were expecting our score to be smaller than NIROM for world 3.
                              Thanks for all the clarifications!
                              Coline

                              Comment

                              Working...
                              X