Username: 
Password: 
Restrict session to IP 

Scoring

1 2 3
Global Rank: 66
Totalscore: 227554
Posts: 245
Thanks: 420
UpVotes: 281
Registered: 15y 94d
shadum`s Avatar







Last Seen: 137d 9h
The User is Offline
RE: Scoring
Google/translate1Thank You!1Good Post!0Bad Post! link
Quote from Kender
Mar 06, 2011 - 21:52:46

I also never liked having challenge count entering the equations, but for a different reason, I feel it rewards people for doing the same repetitive things over and over. Which in my opinion is what happens on sites with a high challenge count.


Some sites, yes, probably so. I don't think this is true of all high challenge count sites-- for example, I don't think it is true of Hacker.org (~283 challenges). As far as repetition goes, I'd say that where you find the biggest repetition of challenges is in the easier 10-15% of the challenges on a site, since there does seem to be a standard set of 'starter' challenges. I can't say that I've noticed a lot of repetition on high challenge count sites-- some, yes, but a lot, no. This, of course, is a good reason to score the early percentages lower, but I do believe that the difficulty values passed to WeChall by most sites adequately compensate without further manipulation by WeChall's scoring mechanisms.

Quote from Kender
Mar 06, 2011 - 21:52:46

WeChall's scoring aims to reflect the amount or level of skill someone has. Not how much time they are willing to spend doing very similar things.


I absolutely agree with that goal.

Quote from Kender
Mar 06, 2011 - 21:52:46
"points per challenge" is never going to work with that.


I don't think this is as true as you do.

Quote from Kender
Mar 06, 2011 - 21:52:46
And the difficulty of individual challenges can never be taken into account here.


The difficulty of individual challenges can probably never fully be taken into account, but that information is reflected, mostly, in the scores sent to WeChall. You say below that this doesn't matter, but I'm not sure how or why it doesn't. True, WeChall has to depend upon the linked sites difficulty calculation but I can't think of many sites where this is badly wrong. However, WeChall can infer challenge difficulty using some of the information sent in from the linked sites.

WeChall can also infer the difficulty of a site as a whole from some of the user statistics and also from the 'Dif' column of the 'Sites' table, which is subjective but looks about 80-85% correct to me.

Quote from Kender
Mar 06, 2011 - 21:52:46
So yes, solving a challenge of difficulty X on a site with a lot of challenges is going to be worth less than solving a challenge of difficulty X on a site with less challenges. This is fair when you think about it for a bit and factor in the choice of sites to pay on and the challenges to choose from on each site.


I don't understand how this is fair or how it does anything but encourage the playing of easy sites and easy challenges on those sites. It also discourages playing the bigger sites, which for one isn't really fair to those larger sites and two discourages, at least in some case, playing harder challenges.

Quote from Kender
Mar 06, 2011 - 21:52:46
Try to shift your thinking from being rewarded for solving a challenge to being rewarded for solving a site.


I understand. This is part of why I used to think the challenge count weighting made some sense. I don't think so anymore. It doesn't work well when some sites have 80 challenges and some have 250 and some have 2000+.

Quote from Kender
Mar 06, 2011 - 21:52:46
I recomend against using the user-voted "dif" column. It is extremely unreliable. For some people it represents how difficult it is, for others how easy it is and for yet others how balanced the difficulty is. Also, most sites have no more than a handful of votes.


I know. That is the worst part of my idea because it depends upon subjective evaluations of difficulty and because voting is sparse. Still, when I look at those 'Dif' values they do not look very far from true. You could probably solve the sparse voting issue by requiring a vote at, say, 75% solved or so, but that is pretty draconian and unfriendly. It is probably better to beg for voluntary voting.

Quote from Kender
Mar 06, 2011 - 21:52:46
The "average" column is a much better indicator of the difficulty of completing a site.


Of the sites I've played in earnest, I'd say the "average" column is a horrible representation of difficulty-- much worse than the subjective 'Dif' column. The 'Average' columns makes Ma's and Electric mid-range difficulty sites which is very wrong and it makes Hax.tor the easiest, which is also very wrong. And SPOJ is the hardest, I'd wager, only because it is so huge. My guess is that it should probably be toward the top of the hard sites but not the absolute most difficult, judged by difficulty not by sheer mass.

I wouldn't be opposed to using both columns though.

Quote from Kender
Mar 06, 2011 - 21:52:46
Whether or not a site internally also uses difficulty for scoring (like rosecode or hackquest) makes little difference. 0% complete is still 0 points and 100% complete is still max points. It's just the curve that's a it different.


That doesn't make sense. A challenge on Rosecode that was a difficulty there of '4' was worth 122 point here. A challenge there of difficulty '25' was worth 649 points. And I solved these back to back so the difference isn't with WeChall's weighted scoring. I didn't gain a lot of points and thus get the point bonus that high percentages give. That difficulty difference in the form of site percentage or challenge score (I'm not exactly sure) was passed to WeChall by Rosecode. And Rosecode is not the only place where I've noticed this. Conversely, CSTutoring does not internally rate by difficulty and you can see a very smooth point per-challenge point change as you work through the site. Some sites do pass internal difficulty calculations to WeChall. That has to matter if you want fair and if you want to approximate a measurement of 'skill'.

Quote from Kender
Mar 06, 2011 - 21:52:46
My solution:
I propose we ignore all challenges over 200 per site for the score. That should take care of the problems you described.


It would be great if you could explain this in more detail. As is, I don't understand it at all? You suggest that 200 challenges is the most that will ever be scored? Aren't you de facto cutting the harder challenges right out of the ranking? At least in some cases, but not all, you could intentionally choose the harder challenges first but why? This solution seems pretty at odds with the idea of measuring skill.

It seems like this would also weight the sites very strangely since all sites with 200 or more challenges are suddenly equivalent.
Global Rank: 1
Totalscore: 760035
Posts: 431
Thanks: 491
UpVotes: 456
Registered: 14y 246d












The User is Offline
RE: Scoring
Google/translate1Thank You!1Good Post!1Bad Post! link
@shadum: No offence, but your interpretation of the current scoring doesn't seem to make sense to me. First of all, the scores are not normalised at all. In fact, I'd say the opposite. The number of challenges and the average %solved are explicitly added to score sites differently according to their "difficulty" and size. Normalising would mean making site_score the same constant for each site (or at the very least removing the number of challenges from the definition). The exponent you highlighted in the score calculation only affects the "shape" of the mapping of a user's %solved to % of site_score.

B.t.w.: It also seems you and Kender seem to refer to the two different uses of the challenge count in the calculation. I believe Kender is actually suggesting to move to normalisation.

Secondly, you say that the "idea behind weighting challenges by percentage solved seems to be to prevent people from solving a bunch of easy challenges on multiple sites and thereby racking up a lot of points for easy work." I believe that it is meant as an automatic way to reflect a site's difficulty in it's score. Compensating for easy challenges is done by the exponentiation: low %solved gets low score per percentage point.

Now what I actually wanted to post about. ;)

I would like to suggest that "we" forget about making the scoring reflect people's intelligence/skill/whatever. I don't believe there is any way to get close to it and it will always be subject to gaming. The fact is that the available data is practically meaningless. Sure, there is a reasonable likeliness that the people at the top are pretty smart. But is the person at #1 really smarter than the person at #10? Or #20? Or even #1000? I doubt it. If you want that kind of ranking the best thing you can probably do is create something similar to an IQ test or the Olympic Games. But even then I'd remain quite sceptical about the meaning of the results.

How about making it about the challenges instead of ourselves? Just stimulating people to do challenges. And preferably push them towards the harder or more interesting ones.

I'd say score challenges according to difficulty and let that be all. Like shadum mentioned, why would the same challenge get a different score just because it's on a different site?

Of course, scoring each challenge at WeChall is currently not possible and in general probably not desirable (in terms of amount of work and the qualification itself). Given the available data, the current scoring method might be pretty good. Personally, I would change it a bit to the following:

site_score = general_diff * chall_count
user_score = site_score * pow(p_solved, e)

Here general_diff would represent the general difficulty of a site, which is used to compensate for differences in scoring of challenges by sites (e.g. site A gives challenge C0 score 5 and challenge C1 score 10 while site B scores them 1 and 2, respectively). This could be based on some statistic(s) as discussed before or just a constant that can be adjusted by the WeChall admins (similar to base_score now).

With e you'd be able to compensate for lack of differentiation within a site (e.g. all challenges get the same score). Use e=1 when a site's scoring is adequate and move it towards 2 where it's not. This, as it is currently, with the idea that a low p_solved most likely means that only the easier challenges have been done.
Totalscore: 317037
Posts: 98
Thanks: 105
UpVotes: 105
Registered: 14y 256d







Last Seen: 12d 8h
The User is Offline
RE: Scoring
Google/translate1Thank You!2Good Post!0Bad Post! link
Quote from dloser
Mar 07, 2011 - 07:19:50

How about making it about the challenges instead of ourselves? Just stimulating people to do challenges. And preferably push them towards the harder or more interesting ones.

I'd say score challenges according to difficulty and let that be all. Like shadum mentioned, why would the same challenge get a different score just because it's on a different site?



Yes! +1
https://www.revolutionelite.co.uk/
Global Rank: 66
Totalscore: 227554
Posts: 245
Thanks: 420
UpVotes: 281
Registered: 15y 94d
shadum`s Avatar







Last Seen: 137d 9h
The User is Offline
RE: Scoring
Google/translate1Thank You!1Good Post!0Bad Post! link
Quote from dloser
Mar 07, 2011 - 07:19:50

@shadum: No offence, but your interpretation of the current scoring doesn't seem to make sense to me. First of all, the scores are not normalised at all. In fact, I'd say the opposite. The number of challenges and the average %solved are explicitly added to score sites differently according to their "difficulty" and size. Normalising would mean making site_score the same constant for each site (or at the very least removing the number of challenges from the definition). The exponent you highlighted in the score calculation only affects the "shape" of the mapping of a user's %solved to % of site_score.


Possibly I'm using the wrong term. Smile My higher math, even my not so higher math, is pretty shaky. I realize that the bit I highlighted really only effects the"shape" of the scoring, but that is the problem, as far as that particular piece of the puzzle goes. On a site like SPOJ you have to solve hundreds of challenges to really see the scoring. That doesn't make sense. True, it should all even out in the end but who is going to solve 2077 (or so) challenges to get to 100%? SPOJ has relatively difficult challenges which really highlight the flaw in the mechanism.

Quote from dloser
Mar 07, 2011 - 07:19:50
Secondly, you say that the "idea behind weighting challenges by percentage solved seems to be to prevent people from solving a bunch of easy challenges on multiple sites and thereby racking up a lot of points for easy work." I believe that it is meant as an automatic way to reflect a site's difficulty in it's score. Compensating for easy challenges is done by the exponentiation: low %solved gets low score per percentage point.


My statement was draw from the first several posts on this thread, but that isn't really a major point. I think that using the site's average solved value is the attempt at reflecting the difficulty component. The exponentiation using the user's percentage solved on a site is the attempt at compensating for easy challenges, and that is the part that seems to me to be broken.

Quote from dloser
Mar 07, 2011 - 07:19:50
I would like to suggest that "we" forget about making the scoring reflect people's intelligence/skill/whatever. I don't believe there is any way to get close to it and it will always be subject to gaming. The fact is that the available data is practically meaningless...

I'd say score challenges according to difficulty and let that be all. Like shadum mentioned, why would the same challenge get a different score just because it's on a different site?


I am also skeptical about attempts to measure skill and intelligence, but I guess not as skeptical as you. I think there is a huge margin of error in most of those attempts, but it is still a decent goal. Ultimately, I don't think that philosophical difference matters, at least between you and me, since I'd also like a scoring that deals only with challenge difficulty as much as possible. I don't think that actually scoring each challenge individually would be easy and possibly not desirable, so the suggestion is to accept a site's internal difficulty rankings and further compensate using some of the statistics that WeChall has available, like (subjective) site difficulty ratings and some of the solved averages. I think that if you are interested in skill and intelligence that is about as close as you are going to get, which probably isn't very close. Smile

Except for the pow(...) part, I am fine with your formula-- not that it is in any way my decision Smile. The way you have the pow(...) set up is different than it is currently so it may work out better, though it does still do something I'd like to avoid. I think that using the scores returned by the linked sites, which almost all have internal difficulty mechanisms, and then weighting by site difficulty is adequate for fair scoring.

I am not wedded to my particular suggestion of algorithms. As I said before, I'd really like to see how some of these suggested algorithms compare.
Global Rank: 73
Totalscore: 213061
Posts: 148
Thanks: 206
UpVotes: 107
Registered: 16y 42d
Kender`s Avatar



Last Seen: 2y 13d
The User is Offline
RE: Scoring
Google/translate1Thank You!1Good Post!0Bad Post! link
Quote from shadum

Quote from Kender
Mar 06, 2011 - 21:52:46
My solution:
I propose we ignore all challenges over 200 per site for the score. That should take care of the problems you described.


It would be great if you could explain this in more detail. As is, I don't understand it at all? You suggest that 200 challenges is the most that will ever be scored? Aren't you de facto cutting the harder challenges right out of the ranking? At least in some cases, but not all, you could intentionally choose the harder challenges first but why? This solution seems pretty at odds with the idea of measuring skill.

It seems like this would also weight the sites very strangely since all sites with 200 or more challenges are suddenly equivalent.

No, the sites would definately not be equivalent. There is the manual basescore and the avg% that determine the score.
And like I've said a hundred times: the sitescore does not indicate the value of a site. It's only the wechall rankpoints you get for completing it.

The hardest part for people to understand seems to be that the points you get on WeChall when you solve a challenge on a site does not indicate a reward for solving that challenge.
It is a reward for going from x% complete to y% complete. So of course going from 0% to 1% gains you little points while going from 99% to 100% is a huge achievement.
Regardless of the amount of challenges, their difficulty according to other people or whatever; the first challenge you solve on a site will be the easiest and the one you solve to complete the site will be the hardest.
This is the only distinction that WeChall can make.

I still maintain that the # of challenges on a site should not directly influence your reward for completing it. It will do so indirectly by influencing the avg% though.
score = site_base_score * p_solved * p_solved * site_unsolved
where
site_base_score: admin-adjustable base score
p_solved: percentage of the site solved
site_unsolved: avg percentage unsolved of all people linked to the site
Some scaling can be applied to balance the influence of p_solved and site_unsolved

This works equally well for sites of 20 or 2000 challenges. For hard sites or easy ones. For sites with scaled or uniform scoring. For sites with ranks and usercounts and those without.
It also has the added benefit that new sites will have a large site_unsolved, which should attract players.
Global Rank: 1
Totalscore: 760035
Posts: 431
Thanks: 491
UpVotes: 456
Registered: 14y 246d












The User is Offline
RE: Scoring
Google/translate1Thank You!1Good Post!0Bad Post! link
Quote from shadum
Mar 07, 2011 - 16:06:02

I realize that the bit I highlighted really only effects the"shape" of the scoring, but that is the problem, as far as that particular piece of the puzzle goes. On a site like SPOJ you have to solve hundreds of challenges to really see the scoring. That doesn't make sense. True, it should all even out in the end but who is going to solve 2077 (or so) challenges to get to 100%? SPOJ has relatively difficult challenges which really highlight the flaw in the mechanism.

From what I know this is not due to the exponentiation. The new scoring with exponent 1+100/#challs results in an almost linear progression for SPOJ. The (other) problem that is clear now is that even with this linear progression, you still only get very little points. This should be because the basescore_per_chall is very small. Usually one doesn't really notice this due to the relatively large site_basescore, but with a lot of challenges, this value becomes of less importance. So even if you get close to 100% on SPOJ, the score per percent point remains pretty small. (Another reason to dump site_basescore.)

Quote from shadum
Mar 07, 2011 - 16:06:02

I am also skeptical about attempts to measure skill and intelligence, but I guess not as skeptical as you.

Let me hope that for you. Smile

Quote from shadum
Mar 07, 2011 - 16:06:02

Ultimately, I don't think that philosophical difference matters, at least between you and me, since I'd also like a scoring that deals only with challenge difficulty as much as possible.

I get that feeling as well. My comments were mainly to make sure we are talking about the same things. ;)

Quote from shadum
Mar 07, 2011 - 16:06:02

Except for the pow(...) part, I am fine with your formula-- not that it is in any way my decision Smile. The way you have the pow(...) set up is different than it is currently so it may work out better, though it does still do something I'd like to avoid. I think that using the scores returned by the linked sites, which almost all have internal difficulty mechanisms, and then weighting by site difficulty is adequate for fair scoring.

The important difference between my suggestion w.r.t. to the current scoring is that I would indeed set e to 1 almost always, effectively eliminating the exponentiation. I don't have an idea of the number of sites that have "proper" scoring versus those that do not, so perhaps there is indeed little reason to try and compensate for just a few not so properly scored sites.


Quote from Kender
Mar 07, 2011 - 20:50:24

The hardest part for people to understand seems to be that the points you get on WeChall when you solve a challenge on a site does not indicate a reward for solving that challenge.
It is a reward for going from x% complete to y% complete.

That is a nuance that will indeed be lost on most, I imagine. However, people not understanding it (or probably just not knowing it) suggest that it might not be the most user-friendly concept. Most will just do a challenge and see that they get more points for it when they are almost done with the site.

Quote from Kender
Mar 07, 2011 - 20:50:24

I still maintain that the # of challenges on a site should not directly influence your reward for completing it.

Sure, from your point of view (only percentages count) that seems logical. The question is what the scoring/ranking should and can mean. With a clear meaning there should be little discussion about the formula.
Totalscore: 317037
Posts: 98
Thanks: 105
UpVotes: 105
Registered: 14y 256d







Last Seen: 12d 8h
The User is Offline
RE: Scoring
Google/translate1Thank You!1Good Post!0Bad Post! link
edited - see post below
https://www.revolutionelite.co.uk/
Last edited by sabretooth - Mar 09, 2011 - 11:43:08
Totalscore: 317037
Posts: 98
Thanks: 105
UpVotes: 105
Registered: 14y 256d







Last Seen: 12d 8h
The User is Offline
RE: Scoring
Google/translate1Thank You!1Good Post!0Bad Post! link
Quote from Kender
Mar 07, 2011 - 20:50:24


The hardest part for people to understand seems to be that the points you get on WeChall when you solve a challenge on a site does not indicate a reward for solving that challenge.
It is a reward for going from x% complete to y% complete. So of course going from 0% to 1% gains you little points while going from 99% to 100% is a huge achievement.
Regardless of the amount of challenges, their difficulty according to other people or whatever; the first challenge you solve on a site will be the easiest and the one you solve to complete the site will be the hardest.
This is the only distinction that WeChall can make.




I have to disagree. In principle and theory, yes you are correct. It holds for any site where user < 100%. What happens however if a user is at 100% on a site but then a new challenge is added. Doe this mean that in them solving this new challenge (which might actually be easier than any challenge already on the site) they are rewarded greatly for solving it? Bad example perhaps as again, this focusses on the difficulty of the challenge. But for new challenges added constantly and the user going from 99.x% to 100% constantly it means a lot of points for very few challenges....
I understand where you are coming from, and to be totally honest I think it may be the most viable solution, but you need to take into account the above scenario.
Regards,
Ian


Edit: Apologies. I reposted rather than edited. My mistake
https://www.revolutionelite.co.uk/
Last edited by sabretooth - Mar 09, 2011 - 11:42:16
Global Rank: 1
Totalscore: 760035
Posts: 431
Thanks: 491
UpVotes: 456
Registered: 14y 246d












The User is Offline
RE: Scoring
Google/translate1Thank You!0Good Post!1Bad Post! link
It is, of course, in no way guaranteed that your first challenge is the easiest etc. It's just an assumption that is made due to lack of (or not using of) actual knowledge. So you could sweep the "new-easy" problem under that carpet. ;)

In any case, before getting those easy points, you'll first lose (most?/some?) of them. Perhaps more interesting with new challenges is what this decrease in points is supposed to mean. Did my skills decrease? Sure, I'm getting older, but come on!
Global Rank: 73
Totalscore: 213061
Posts: 148
Thanks: 206
UpVotes: 107
Registered: 16y 42d
Kender`s Avatar



Last Seen: 2y 13d
The User is Offline
RE: Scoring
Google/translate1Thank You!0Good Post!1Bad Post! link
Quote from sabretooth
Mar 09, 2011 - 11:40:54

Quote from Kender
Mar 07, 2011 - 20:50:24

Regardless of the amount of challenges, their difficulty according to other people or whatever; the first challenge you solve on a site will be the easiest and the one you solve to complete the site will be the hardest.
This is the only distinction that WeChall can make.

I have to disagree. In principle and theory, yes you are correct. It holds for any site where user < 100%. What happens however if a user is at 100% on a site but then a new challenge is added. Doe this mean that in them solving this new challenge (which might actually be easier than any challenge already on the site) they are rewarded greatly for solving it? Bad example perhaps as again, this focusses on the difficulty of the challenge. But for new challenges added constantly and the user going from 99.x% to 100% constantly it means a lot of points for very few challenges....
I understand where you are coming from, and to be totally honest I think it may be the most viable solution, but you need to take into account the above scenario.

Erhm, this is already the case on WeChall. I'm surprised you never noticed that noone ever goes above 100% Smile
Every time anyone has an update all the scores are re-calculated.
If I go from 100% to 99% then I lose the same amount of points I got when I went from 99% to 100%.
Quote from dloser

Perhaps more interesting with new challenges is what this decrease in points is supposed to mean. Did my skills decrease? Sure, I'm getting older, but come on!

Well, if you solve the same old basic starter challenges on another dozen sites you score will increase. Does that mean you skills increased? Come on, it's only an approximation.

Shortest term / easiest solution to the SPOJ problem is to change num_challs to max(num_challs, 200) (or some other number). I do not like this.

Better solution is to exchange the current num_challs as difficulty indicator with the avg_unsolved% and make it count for more than the current num_challs.

Making the scoring linear so that you get the same amount of points regardless of how far you are on a site is not an option for me.
That would mean that a person doing he easiest 5 challenges on 20 different sites will rank higher than someone who has finished both Electrica and +Ma's.

Using the num_challs anywhere is also not an option for me.
Sites are free to structure their sites, scoring and ranking however they want and only report progress percentage to WeChall. Any other data has always been optional.

We cannot use the user-voted difficulty for sites due to several reasons:
- there are only a handful of votes.
- people are not sure whether they vote for how hard, how easy or how well balanced a sites difficulty is.
- it is unclear if it is about the average challenge difficulty on the site or the difficulty in completing the site.
- the votes are only over a limited set of challenges since most people have not completed the site.

So, our challenge is to arrange these three variables in a formula in such a way that everyone is happy:
- p_solved - Maximum number of "points" on a site divided by the users actual "points" on the site. (some sites use challs as points, some have their own points system, others use rank, etc.)
- base_score - Manually adjustable per-site fixed number used for weighting in rare cases.
- avg_solved - The average of p_solved of all the WeChall users linked to the site.

Here's a starter:
2000 * pow(p_solved,2) * 1/pow(1-avg_solved,2)

This gives for example:
GeSHi`ed Plaintext code
1
2
3
4
5
site            first   last    total
SPOJ            0.65    2722.09 2847989
HackThisSite    3.50    710.25  36401.12
Hax.tor.hu      3.28    326.26  8238.82
first: points for 1st solve, last: points for last solve, total: total points when site complete

Which is still a long way from perfect, but you catch my drift ;)
1 2 3
tunelko, quangntenemy, TheHiveMind, Z, balicocat, Ge0, samuraiblanco, arraez, jcquinterov, hophuocthinh, alfamen2, burhanudinn123, Ben_Dover, stephanduran89, braddie0, SwolloW, dangarbri have subscribed to this thread and receive emails on new posts.
1 people are watching the thread at the moment.
This thread has been viewed 20586 times.