Chat:World/2021-06-01
memcorrupt: https://escape.codingame.com/?fromToken=C12-6FP-itS-dKq
memcorrupt: im at #2
memcorrupt: plz use my link :pray:
eulerscheZahl: no, dbdr is at #2 I dropped :(
Chainman: I'm like #615
jrke: dbdr cllimming a lot
Chainman: Wait or are you talking about rank?
eulerscheZahl: he plays the community games that slowly get more players
eulerscheZahl: https://www.codingame.com/leaderboards/general/global
jrke: dbdr can be first in few days
jrke: you yourself can try that euler
jrke: playing community games
eulerscheZahl: but i'm lazy and cherry picking
jrke: some are having good investment
eulerscheZahl: when I have 2 options: community multiplayer and contest on another website
eulerscheZahl: what do you think i would do?
jrke: contest
jrke: not just you everyone
eulerscheZahl: correct. codeforces ended 2 days ago
eulerscheZahl: i might try topcoder which starts tomorrow, not decided yet
jrke: i only code on CG and sometimes rarely puzzles on hackerrank
eulerscheZahl: i stopped hackerrank when hackerrank stopped contests
eulerscheZahl: i liked their 2 days long code sprints
jrke: is there any other website having contest like CG?
eulerscheZahl: not currently running
eulerscheZahl: this one *might* be interesting, i don't know myself yet https://www.lux-ai.org/
TranTuan1: such as https://codeforces[.]com/
eulerscheZahl: codeforces is exactly what CG is not like
eulerscheZahl: here on CG we have bot programming contests lasting 11 days, codeforces has short rounds (2-3h) with several problems that have a clearly correct answre
eulerscheZahl: there are occasional marathons with optimization problems (like the Huawei contest last week) but no multiplayer games and no visuals
eulerscheZahl: also CG works on problems with a much smaller scale (like map size) just for visual reasons alone
eulerscheZahl: the Huawei contest had a graph with up to 5000 nodes, you won't find anything like that in a CG contest
jrke: yeah
derjack: good morning
BlaiseEbuth: ¡ʞɔɐɾɹǝp ʎǝH
zapakh: :upside_down:
derjack: apparently blaise and zapakh are from australia :thinking:
test
Federoll: no <script>window.location = "https://teletubbies.com"; </script>
cegprakash: hi anyone good with ruby on rails? I've some beginner noob stupid doubts
cegprakash: 1) why do some variables have colon in front and some do not have
cegprakash: 2) what does a line with just a variable before end of function do? Do they return it?
KiwiTae: cegprakash :x is a symbol
KiwiTae: :smth
cegprakash: colon is a symbol yes but what do they do?
KiwiTae: its bit like & in cpp it maps to a memory address
KiwiTae: or not lol actually dont know much ruby
cegprakash: http://chat.codingame.com/pastebin/005fbda7-79a1-4799-a985-b52569556dfa
KiwiTae: just know its usefull :hehe
KiwiTae: first one is hash
KiwiTae: nd if 2nd doenst work its prob because its not ruby syntax?
cegprakash: there is a funciton in our production code which looks like
cegprakash: def fn
a.b
end
cegprakash: what does this function even do :D
BlaiseEbuth: -> remove it and see what's not working.
BlaiseEbuth: Pro tip
cegprakash: :joy:
cegprakash: it's a big code base
cegprakash: I don't even know what triggers which function
cegprakash: just trying to understand the syntax first
cegprakash: I built a sample working application on rails and even wrote a blog but still I don't know basics :D
derjack: i think you need to learn ruby syntax before trying rails
cegprakash: A ruby symbol is like an Enum constant in Java or C++ like:
cegprakash: now it makes more sense
Federoll: "shortest possible solution" challenges arent even fair lmao python guy produces a 80 char solution when I do it with 750 characters in c#
BlaiseEbuth: You can learn python.
derjack: use python for shortest then :shrug_tone1:
cegprakash: haha I can't choose sadly.. I already know python. This is for work
cegprakash: I thought it was for me derjack
derjack: not this one ~
ash_rick: https://www.codingame.com/clashofcode/clash/1787370f42ab48a58bc55a47794dc80c1aa5606
BlaiseEbuth: #clash ash_rick
1rre: > You can learn python this is a bad solution though, you could either use something like scale factors: http://cloc.sourceforge.net/#scale_factors or you could remove the boilerplate/execute other languages in the repl
KiwiTae: Federoll im pretty sure u can do better than 750chr in c#
Chainman: hey
Xeno_1221: hey
Chainman: I'm so confused about how to propagate up winrate for mcts :(
memcorrupt: im #2 on the list to get access to this. please use my link so i can get priority :pray: https://escape.codingame.com/?fromToken=C12-6FP-itS-dKq
derjack: Chainman why
darkhorse64: you don't backpropagate a winrate but a result
BlaiseEbuth: How much do you pay memcorrupt?
memcorrupt: ???
darkhorse64: if result is a win for white, when the node is black turn, count it as a loss
darkhorse64: memcorrupt: it's a joke
BlaiseEbuth: You want us to click your link. You get an advantage from this. What do we get?
Chainman: I simulate for computer, and if computer wins I increment winrate for nodes that are his turn
Chainman: wait huh?
Chainman: don't backpropagate winrate but a result?
Chainman: sorry yeah
Chainman: I word that wrong
Chainman: I backpropagate result that updates winrate in the nodes
darkhorse64: to be accurate, you increment/ decrement the score on each node taken by your search and you increment the number of visits. That gives you a winrate
Chainman: at each node, you have the result
darkhorse64: yes
Chainman: and from my understanding you increment winrate if the result is positive for that player
Chainman: or you decrement
darkhorse64: yes
Chainman: must be something else wrong
derjack: oO
MSmits: yo
derjack: :scream:
MSmits: :grin:
MSmits: I improved my endgame book generator with a factor of 10. It went from 1 to 25 seeds over night
MSmits: think i can get to 36 if i take a few weeks
MSmits: after that I am memory limited (and it would go beyond 120 GB disk )
derjack: time to get more RAM then
jrke: just a doubt in MCTS we simulate till we get defined or fixed state
jrke: so that fixed state can be X depth
jrke: ?
MSmits: there's several ways to do that jrke
MSmits: you can play out till game is finished
derjack: in normal MCTS simulate until end of game
MSmits: you can also playout to depth x
MSmits: and evaluate
MSmits: or you can choose not to playout at all and just evaluate after expansion
jrke: so what do we call that rollout?
MSmits: you need a good evaluation function. That's the most important thing
MSmits: early playout termination
MSmits: MCTS-EPT
MSmits: I use the "no rollout" version
MSmits: in onitama and oware
MSmits: and othello
derjack: EPT - early playout termination
derjack: ahh you wrote it
MSmits: it's ok, repetition reinforces
derjack: and overfits
MSmits: sure
MSmits: I'm still thinking about how to use meta mcts for supervised learning
MSmits: seems only workable for nearly solvable games
MSmits: or you wont get enough depth
MSmits: oware will work I think
MSmits: Can do bandas and onitama as well
MSmits: but breakthrough is doubtful
derjack: breakthrough has big branching factor and rather long term profit openings
MSmits: yeah
jrke: clobber?
MSmits: onitama i can just generate random starts, solve them and then learn them
MSmits: same with bandas
MSmits: dunno about clobber
MSmits: seems impossible to even run a meta mcts :P
MSmits: I did it for a while, all moves 50% WR :P
jrke: :smiley:
jrke: oh C4 is the POTW
MSmits: yeah I dont think i would try to train a NN for that
MSmits: too small of an improvement over my current bot I think
derjack: youd need convnet for that
MSmits: well it would sure help
MSmits: but you do these games without it
MSmits: like breakthrough
jrke: i got this yesterday - https://youtu.be/PJl4iabBEz0
derjack: MLP NNs are quite crap for connection games
derjack: i was thinking about implementing c4 in python and use the libs for convnet
MSmits: or wait till Marchete does the convnet
MSmits: well you'd still do python localy
MSmits: keras is sooooo easy to use
MSmits: I appreciate it because I used the handmade simple MLP thingy first. With Keras that reduced to 5 lines
MSmits: training was faster, but prediction was slower
MSmits: (singular prediction)
MSmits: probably need to use tensor input as opposed to numpy
derjack: hm?
jrke: i can't install pytorch in windows machine any suggestions?
BlaiseEbuth: use linux
derjack: wsl?
Chainman: wsl2
Chainman: omg, I can't find what is wrong with this mcts, I feel i'd need to draw out the graph and states lol
Chainman: I need some graphical representation beyond just printing
derjack: and whats wrong with it? is not winning?
1457162: What is c4?
BlaiseEbuth: An explosive
DomiKo: Connect 4 too
BlaiseEbuth: A cell on many game boards
Chainman: yes it is losing
Chainman: It is very bad
Chainman: It's weird the winrate is so negative for nodes that represent the bot
Chainman: I tested
Chainman: and in all of it's rollouts over an entire game it finds player0 wins 391943, and player1 wins 3324
Chainman: And this is what always happens
Chainman: it finds a disproportionate number of wins for player0 always
Chainman: That must be wrong, I just need to find what is causing that.
Uljahn: :duck:
BlaiseEbuth: :shotgun:
Chainman: Is it possible to have too large of rollout?
DomiKo: bigger is better
Chainman: I think I found a problem maybe
Chainman: it had a point where it had two possible moves
Chainman: one of them would be win, and other would lead to tie
Chainman: but, some reason the winrate is -1 and 0
Chainman: it should be 1 and 0
derjack: if you do rollout, if its a win, do you do the correct player's win
Chainman: I can't understand it, but yeah it's doing something wrong at the winning stage
derjack: wrong sign
derjack: you could also have over 2 billions rollouts and then get int overflow
Chainman: nah it's in python rn
Chainman: only 100k iterations
Chainman: It might be wrong sign
Chainman: if I reverse the signs the bot is worse
Chainman: The thing is it is hard for me to beat the bot some reason
Chainman: to win
Chainman: but when I let him win, he always goes for tie
Chainman: He just wants to tie with me lol
Uljahn: Automaton2000: look, i can't debug my own code lol
Automaton2000: what do you think of a better algo
MSmits: Chainman these are very common mcts issues. The best thing you can do is follow every step of a rollout and manually check it
Chainman: I think I fixed it
Chainman: It is actually making winning moves now
MSmits: good
MSmits: which game is this?
Chainman: I was accidently expanding and simulating beyond terminal states
MSmits: oh right c4
Chainman: that is why I was seeing the negative winrate and an offset in winrate at obvious terminal states.
MSmits: ahh ok
MSmits: you can improve your c4 bot by avoiding losing moves in the rollout
MSmits: like... dont play a move if it gives your opponent a winning move next turen
MSmits: turn
MSmits: but only do that if you're sure you've fixed
derjack: bitboard?
MSmits: smart rollout is probably more important than bitboard, but it's much easier to make a smart rollout with a bitboard
MSmits: well for me anyways, you need some bitboard skills
MSmits: my rollout doesnt ever produce 4 in a row
MSmits: it terminates when there is no move available that doesnt give your opponent a winning move
derjack: so smart
MSmits: darkhorse does this too... works nicely
MSmits: i tried smart stuff like this for uttt, but it never yields much improvements
MSmits: derjack is there a special reason your pony is upside down? Are you in Australia when you're on this account?
punter147: http://chat.codingame.com/pastebin/bd3f3fd1-7ec3-48a0-8a48-c4cf9e2fc63c
BlaiseEbuth: "when you're on this account"?
MSmits: it's a test account belonging to a well known player
derjack: yeah. i like alps
MSmits: punter147 is this regular mars lander?
MSmits: marslander 1?
MSmits: if so... then jeez terrible overkill with GA
punter147: its level 2 of the mars lander.
derjack: oh
MSmits: ohh ok. I solved it with if else
MSmits: it's really hard though, took me 2-3 days
BlaiseEbuth: The pony isn't really the pony? :scream: Who's he ?
MSmits: pony will PM you if he wants you to know
BlaiseEbuth: Why so secret...
punter147: wow really? yes it would be very difficult with if else atleast for me
MSmits: hey i dont give away other people's identity.
MSmits: punter147 I was new on CG then, 3 yrs ago
MSmits: otherwise i would have tried GA maybe
MSmits: many have
MSmits: you can probably find out a lot about the marslander ga on google
MSmits: i think people wrote guides even
BlaiseEbuth: It's jacek.
punter147: oh nice suggestion, i will check all the guides available thanks a lot MSmits
MSmits: hi jacek, i didnt know BlaiseEbuth was also your alt
BlaiseEbuth: Devil is inside everyone.
MSmits: is this your religious alt jacek?
MSmits: oh it's your avatar
darkhorse64: re c4 bitboarding is so much faster than grid based sim that it makes a huge difference in the leaderboard
Chainman: Why are so many people doing c4 right now?
derjack: if youre jacek, whos your waifu?
MSmits: Chainman puzzle of the week
Chainman: oh, cool thanks
MSmits: btw, best opening move is 2nd from left or 2nd from right
MSmits: it's the only balanced move apparently
jrke: 4th from middle :stuck_out_tongue:
MSmits: (gives 50% wr for p1 and for p2)
MSmits: 4th from middle is the strongest move, which is why you dont play i t :)
darkhorse64: If you play 4th from middle, I'll steal it
MSmits: it has 79% WR in my meta mcts currently, after 25 million games
MSmits: move 3 and 5 are close to 70%
MSmits: move 0 is a little above 30%
MSmits: so 30-50-70-70-80-70-70-50-30
MSmits: roughly
MSmits: wait no
MSmits: 30-50-70-80-70-80-70-50-30
jrke: 80 isn't that too high
MSmits: well, the thing is, once a meta mcts is starting to solve, the wr will only go up
MSmits: get nearer and nearer to 100%
MSmits: because the best counters get solved first
MSmits: and only worse moves remain
MSmits: (for p2)
Chainman: meta-mcts means mcts is simulating with mcts?
Chainman: not random playouts right?
MSmits: it's a mcts where you do everything you normally do, except that a random playout is a full game
MSmits: like a full game as played on CG
MSmits: i did 25 million of those
MSmits: so the results are more accurate than they normally are
Chainman: Oh you are just playing till end of the game in simulations.
MSmits: the simulation finishes the game with the same calc time you get in the real game on CG
MSmits: so they are much slower
MSmits: as the tree gets deeper, your sims become shorter though
MSmits: most of my games are like 5 moves, then they solve
MSmits: so 5*50 ms, so 4 games per second
MSmits: but i use 10 processes, so 40 games per second
derjack: but we have 100ms per move :?
MSmits: yes, but my cpu is twice as fast
MSmits: i try to keep it the samwe
MSmits: same
Chainman: lol dang
MSmits: currently not running it though, busy with oware :)
derjack: im curious if training with actual scores will be better
MSmits: me too, the hard part is the space between 48 and 36 seeds where i dont have a book
MSmits: i can easily run the meta mcts, but i need some way to select move that explores properly, yet does not run into nodes where i have too few games
MSmits: select for training i mean
MSmits: selecting just random high visit nodes might be bad
MSmits: I suppose you run into the same issue with azero type training, how much do you explore?
derjack: i use softmax for final move selection
derjack: temp 1 with first few moves, then temp 3 to 8 depending on game for rest
MSmits: during training you mean, right?
derjack: yes
MSmits: or also during real games?
derjack: no
derjack: oh, also during training i multiply by random [0.75;1.25] eval during selection while in final games its 0.9-1.1
MSmits: ah right
MSmits: I hope you get convnet working some time
MSmits: because when you do, we all get a chance to learn from it
MSmits: it's really quite nice that you share so much
derjack: :blush:
derjack: oops, temp 1/3. to 1/8. during rest moves
derjack: lower temp, more exploitation
MSmits: ahh that makes more sense
MSmits: anyways, train arriving, ttyl!
derjack: during training
DomiKo: Hi MSmits how NN is going?
derjack: he's on train. probably for training
KalamariKing: He's training on a train
BlaiseEbuth: https://www.youtube.com/watch?v=hHkKJfcBXcw
Wontonimo: I watched that all the way through
Wontonimo: banger
BlaiseEbuth: :3
MSmits: i was on train, home now :P
Xascoria: it do the do the does
MSmits: DomiKo I am doing a lot of preparatory work for oware NN
DomiKo: yeah there is a lot to do
MSmits: first generating endgame book for value labels. Then adjusting meta mcts to use the book to label seed states 37-48, then training NN on python/keras, then a working c++ bot with inferrer
DomiKo: any reason for keras?
DomiKo: or just first pick?
MSmits: keras is all i know really and it's apparently easy and popular
MSmits: I used it for TTT
MSmits: (as in: normal TTT)
Wontonimo: keras is in Tensorflow and also used to be a stand alone framework. I think you are using it within tensorflow, is that right?
MSmits: yes
Wontonimo: it sure simplifies a lot without giving up much
Uljahn: the only alternative i can think of is pytorch but they are quite different
DomiKo: yeah
MSmits: I'm sure there are differences, but I doubt I'll ever get to the point where they start to matter
MSmits: as long as it does what it is supposed to do and i can figure out how stuff works, it's all ok
Wontonimo: i've not used pytorch. How are they quite different?
Wontonimo: i'm assuming pytorch (like TF) is all just matrix operations under the hood
DomiKo: Keras is slower
DomiKo: And in pytorch you can do more complex stuff
Wontonimo: like what complex stuff?
derjack: complex numbers :v
Wontonimo: that's just imaginary :P
DomiKo: keras is like high level API
MSmits: can't you do everything you want with TF while using Keras
MSmits: just at a lower level API
DomiKo: I guess you can
RoboStac: yeah, but it's all tensorflow underneath so you can always drop down to that if you need the complexity
Wontonimo: but the framework is there for writing your own. it's not like you are locked into the high level. you can go as low level as you want in tf
MSmits: ah right RoboStac, that's what I meant
MSmits: so it's like getting eased into TF
RoboStac: pytorch is possibly a bit more in the middle - it's not as high level as keras but easier to work with than tf
DomiKo: but they we are comparing TF and pytorch and no't keras
RoboStac: at least from my tests
Wontonimo: yes, i was thinking TF vs PyTorch
MSmits: well keras is part of TF now right, so you cant really separate them
DomiKo: then I would say that TF is awful
DomiKo: it's really hard to read
Uljahn: with keras you can switch backend to theano ot cntk i guess
derjack: or you make your own NN framework [solved]
MSmits: I found it really easy to use in my first attempt
Wontonimo: yeah, better to just redo the efforts of a 1000 people over several years on your own. Definitely will work out well. I joke, but the attempt is a great learning experience if done strategically
Wontonimo: which i guess is the whole point of NN from scratch
derjack: nah, numpy is too high
MSmits: I liked adapting the xor example (without numpy) to play TTT
MSmits: but i would not go for more complex things with my own framework
MSmits: too much work for little gain
MSmits: when i converted it to use Keras it was sooo much nicer
MSmits: both worked though
Wontonimo: after everyone goes all NN on CG, what's next? Quantum Computing? Are we all going to have to learn about qbits in the next 10 years.
derjack: welp https://en.wikipedia.org/wiki/Quantum_neural_network
MSmits: mmh dont think i can leave my book generator running overnight. Seems to eat 1 GB / 4 min
MSmits: because it makes backups
BlaiseEbuth: nom nom nom
derjack: backups? you dont believe in yourself?
struct: every 4 minutes seems a bit extreme
emh: any interesting articles on tech.io lately?
MSmits: struct, i do it every iteration, but with higher seed counts an iteration can take hours. I only working on 26 seeds now
MSmits: 36 seeds may take days, I don't knoe
MSmits: know
struct: damn
MSmits: http://chat.codingame.com/pastebin/1278a524-72c2-41c7-9bae-657baf3dbe51
MSmits: so it found book 25 and continued from there, to do 26
MSmits: it keeps going until all states give the same answer between iterations
MSmits: (meaning more turns dont do anything)
MSmits: basically I assume a turn limit of 1 turn at the start and keep stretching the game until the end result for each state is the same
MSmits: (this is a form of retrograde analysis)
Marchete: ok Im pretty stupid
Wontonimo: book generation is something I haven't done any of
Marchete: float randNoise = rnd.NextFloat(1.0f - conf.simpleRandomRange, 1.0f + conf.simpleRandomRange);
reCurse: Don't
Marchete: damn me
Wontonimo: don't be pretty stupid, or don't do book generation reCurse?
Marchete: I wish this can be avoided
reCurse: Why not both
Marchete: there is no cure for stupidity, source: me and 99.9% of humankind
Wontonimo: what are your thoughts on not book generation reCurse?
Wontonimo: other than just don't
derjack: he has trauma from opening books
emh: are there closing books?
Wontonimo: I'd hate to see your bookself if you haven't discovered closing books
emh: hahaha
Wontonimo: i think MSmit was just working on closing book
emh: I see
emh: I want to work on Smash the Code but I don't have the energy
Marchete: with the meta MCTS he has
CHT-DAT: HiHH
Marchete: RoboStac you there? do you know if invalid moves has negative effects on softmax's crossentropy?
Marchete: in Policy part of A0
reCurse: Wontonimo: It kills competition in fixed start games
Marchete: I did some custom loss to ignore losses on invalid moves
Marchete: but I'm not sure...
Wontonimo: imho, that sounds fine Marchete
MSmits: yeah i am just working on endgame book now and will use meta mcts to deal with the early part of the game. To use as data for supervised learning. Not doing opening books
derjack: Marchete afaik you ignore illegal moves and renormalize the others
Marchete: it sounds "simple", but I don't want to touch anything on the training pipeline
Wontonimo: you could take it 1 step further Marchete and use the valid moves to zero out the logits of the illegal moves before softmax is applied
Marchete: you can do that Wontonimo?
Wontonimo: yeah, for sure!
Marchete: I was thinking about a Multiply input
reCurse: You can do anything you want
Marchete: before policy
Marchete: but I don't know if that affects backtracking
Marchete: how's the right way?
reCurse: There's no right way and I'm not trolling
reCurse: Use your intuition, try stuff, measure results
Marchete: :expressionless:
Marchete: my intuition is masking out before softmax
reCurse: So try that
reCurse: I'll just mention zeroing the logits before softmax is a bad idea
reCurse: You want to minus BIG them
Marchete: yeah, softmax is not 0, but -9999999.9
reCurse: To have the intended effect
Marchete: I learned it
reCurse: Not saying it's a good or a bad idea but that sounded wrong so
Marchete: def softmax(xs): http://chat.codingame.com/pastebin/aecb1bf4-16bc-4dfb-be45-7c54586d9221
Marchete: it's true, I can't mask
Marchete: I need to -999999
Wontonimo: right right ... sorry about that. don't zero the logits, my bad.
Marchete: that doesn't break the training pipeline in any way?
reCurse: As long as it's differentiable nothing breaks
Marchete: :thumbsup:
reCurse: Now whether that gives better results or not is another story
MSmits: If you like experimentation such as you would in physics, machine learning is really the perfect field in computer science.
MSmits: so many variables, so many possible experiments :0
Wontonimo: you want the gradient to be blocked entirely for the illegal moves, so multiplying by zero is actually important. The equation could be logit *mask - 1e10*(1-mask)
reCurse: At the risk of repeating myself
reCurse: It's debatable whether blocking the gradient is helpful or not
Wontonimo: i'm debating that it is important ;)
reCurse: There's evidence on both ways /shrug
Wontonimo: but, i agree 100% with experimentation
reCurse: Experimentation is more to do with how unpredictable it is
Marchete: probably isn't helpful, but I'm in a very experimental phase
MSmits: it's even more fun when you can use the results of experimentation on CG. Hope to get there.
reCurse: From one problem to another
reCurse: From one run to another
reCurse: All you have is intuition and past results
Marchete: I can have 90% winrate and 20 generations later it's 4% against that best generation
reCurse: Catastrophic forgetting
Marchete: so it's completely broken for now
reCurse: Lots of ways to deal with that
Marchete: I have no idea
MSmits: yeah, I make to do lists
reCurse: If you have a genuine RPS scenario then it gets hairier
Marchete: it goes 80,60,50,40,....60, and suddenly it starts going to hell
emh: empiricism. hmm. I'm eating gouda with viking onion (chives)
Marchete: to a whopping 4% winrate
reCurse: Could also be gradient explosion
reCurse: So many things
reCurse: Have fun
Marchete: I'm loving it
Marchete: *not*
MSmits: you also have to look at some actual games. Just to make sure you aren't playing the same games over and over
reCurse: I do actually, eh.
Wontonimo: can you adjust your learning rate to be a bit lower and batch size a bit higher?
reCurse: What's not to like about an infinite well of mystery
sprkrd: One thing
Marchete: and epochs Wontonimo?
Marchete: how many per train step?
sprkrd: I have no idea what you're talking about, but there's something wrong with the softmax implementation you pasted earlier, @Marchete
emh: imagine how boring if creativity was conquered by AI
Marchete: because K_BATCH_SIZE=256 K_EPOCHS=20
Marchete: it takes like 5secs on training...
Marchete: and the another generation
Wontonimo: yes, and epochs. another trick is to not train on items that the network already knows really well, it helps decrease forgetting.
Marchete: I got small subsets
MSmits: whats the difference between ordinary forgetting and catastrophic forgetting?
Marchete: tldr: all these hyperparameters are like sorcery
reCurse: There's a method to madness
Marchete: move lr high, then go down, then another damn parameter...
reCurse: Just see it like adjusting constants in a heuristic with batches for testing
reCurse: Exactly the same thing
reCurse: CG trained you for this
MSmits: I guess you start with a very simple network with parameters you used before. Also watch the loss graphs
reCurse: Start with the simplest example that works and go from there
Wontonimo: i don't know of a difference between the two forgettings. perhaps the catastrophic one means total network collapse? idk
MSmits: ye, trying to fit 10 eval params by hand with repeated cg bench is actually a lot less fun than experimenting with NN
Marchete: for me training a generation that has a winrate of 70-90% against a random is like a working example
reCurse: Hmm.
reCurse: Dunno about the game but
reCurse: I'd expect 100% or close to
reCurse: You're giving random player way too much credit
MSmits: for TTT that is almost impossible, but for a longer game with say 30-40 turns, the random player will make many mistakes
MSmits: so then 100% should be possible
Marchete: https://github.com/suragnair/alpha-zero-general
Marchete: experiments graph
reCurse: Yeah not a fan of that repo hehe
Marchete: neither me
MSmits: i did not get that to work
reCurse: I haven't found a single repo I liked tbh
reCurse: So there's that
Marchete: that graph is ridiculous in the sense that they compare against random and greedy
derjack: generally my 1 or 2 generation bets random by >95%
derjack: yeah. if you dont know even any heuristic, compare it to vanilla mcts
Marchete: but if you have a low learning rate
Marchete: and low sample count
Marchete: it simply can't reach high winrates
MSmits: Marchete a generation could be many epochs
MSmits: i dont know how derjack defines it
reCurse: Epoch is ill-defined imo
Marchete: I know, I have 20
reCurse: Number of steps is much better
MSmits: yes reCurse i noticed this
MSmits: but there are steps within steps
reCurse: Um?
reCurse: There's only one kind of step
reCurse: The one that updates your paramters
derjack: 1 generation: self-play N games, 2-3 epochs over replay buffer
MSmits: you have a game generation step and then a learning step
MSmits: the learning step itself might have multiple iterations
MSmits: so steps within steps
emh: wheels within wheels within wheels
reCurse: You're specializing this way too much for me
reCurse: I was just in favor of replacing epochs with steps
reCurse: Which also applies to SL
reCurse: shrug
MSmits: it's better to just always be clear
reCurse: Step is very clear
reCurse: Epoch isn't
MSmits: like derjack just did i mean
MSmits: replacing epoch by step maybe
MSmits: it's like when you run mcts within mcts, you kinda have to clarify which one you're talking about
reCurse: Which makes no sense
Marchete: you convinced me, I'm going to change "epochs" in Tensorflow repo
Marchete: for steps
MSmits: nvm we're saying different thing reCurse
derjack: canadian english vs world english
reCurse: That's very nice of you to assume other canadians understand me fine
MSmits: i was just saying that if you say "learning step". You could be talking about the whole thing that jacek defines as a generation or a step that he defines as an epoch. They can both be called step
MSmits: one is a subdivision of the other
derjack: learning step could also mean learning rate D:
MSmits: yep
reCurse: Am I wrong? I had the impression most of the literature referred to 'steps' when talking about an update of the paramters
RoboStac: epochs is the parameter name in keras for how many times you train on each batch
MSmits: so thats why i am saying, use whole sentences to describe things. with 2 words there's too much chance for miscommunication :)
reCurse: There's a reason why we make new words with new definitions
reCurse: Otherwise communication becomes a burden
MSmits: yes, but if people use them in different ways, they dont help :)
RoboStac: so people who've use keras tend to know it as that
derjack: yeah like someone saying he uses MLP and its not obvious which one is it
reCurse: I was under the impression no one misappropriated the term 'steps'
MSmits: its only not obvious when you do it :P
RoboStac: theres a steps_per_epoch parameter too if you want to make it more confusing :)
MSmits: reCurse let me put it this way, possibly when NN professionals speak of steps they all mean the same thing. But when we discuss it here, I doubt it
reCurse: Well I don't expect much here when searchless bots are continuously referred to as 'heuristic' bots after years and years...
reCurse: Doesn't mean I won't try
MSmits: yeah, the trying itself is exactly what I meant. Explaining things is giving meaning to definitions
MSmits: so keep doing that :0
reCurse: I was genuinely asking if I was wrong in assuming that in the domain
reCurse: I don't really care about the CG meta :P
MSmits: ahh ok
MSmits: what would you call a searchless bot reCurse?
reCurse: A reflex bot if you prefer
MSmits: oh right, i've seen that used in papers
sprkrd: rule-based?
reCurse: Yes
sprkrd: I like rule-based :)
reCurse: Robostac: This is disturbing, I'll have to look this up
MSmits: is "heuristic" just poorly defined or misused?
reCurse: Misused
kovi: in most cases heuristic bot is just greedy
RoboStac: https://keras.io/api/models/model_training_apis/
reCurse: Thanks
reCurse: Ok so 'training steps' then
MSmits: that's very clear
Marchete: but fit also has a epochs
Marchete: no?
reCurse: No
MSmits: so an epoch is training 1 time on the entire dataset and training step is 1 data point?
reCurse: Training step is a single update of the parameters
reCurse: From a backpropagated gradient
MSmits: oh, so it depends on batching
sprkrd: btw, there's a slight undesired effect in the softmax function pasted earlier by Marchete. I guess most of you already know it and probably it's written like that for illustrative purposes, but one should always subtract the max value of the xs vector to prevent float overflow with the exponential function (in case someone happens to use that).
reCurse: Yes
reCurse: sprkrd: Yeah the numerically stable softmax trick
Marchete: that's a simple example
Marchete: I rely on TF's softmax
Monarc: is there a way to put my location on my research i have edge
Marchete: and no, I have no idea
MSmits: Monarc weird question
MSmits: maybe rephrase
Shelby: I'm looking for a tensorflow classic puzzle. Can I have its name?
RoboStac: it got removed
MSmits: dont think we have that
reCurse: An epoch is ill-defined, usually they mean looping on the entire dataset, but it becomes very dependent on other hyperparameters. Like changing batch size has a radical effect on what 'epoch' means and how you can compare it (you don't)
MSmits: because it was python 2 wasnt it?
reCurse: Or what acquiring a bigger dataset means
reCurse: Or how to even define that in the context of infinite data like RL
reCurse: It's annoying IMO
MSmits: reCurse do you mean that if you have 1000 data points and a batch size of 50, you do 20 updates in 1 epoch and if you use batchsize 1000 you do 1000 updates in 1 epoch, which is why its bad to compare?
reCurse: Exactly
MSmits: ok, but that doesnt mean its poorly defined, it's just a bad metric
Shelby: Do you know of a replacement for the puzzle? OI would like to learn.
reCurse: Still poorly defined, how do you use that in RL?
MSmits: useless things can be very well defined
reCurse: How do you define an epoch in RL?
MSmits: the number of times you learn over the full data set
reCurse: What is 'the full dataset'
MSmits: whatever you define it as i suppose
reCurse: ...
reCurse: How is that not ill-defined lol
MSmits: then the full data set is ill defined :P
bigman69: all my test cases come back positive yet when i submit i dont get 100%. I understand its to prevent hard coding but how am i supposed to know where the error is?
MSmits: i guess by extension...
MSmits: perhaps epoch is a remnant from an early time of RL when people would still always use all their data
reCurse: Or let's say you start from a smaller dataset to test
reCurse: Then you move on to the bigger one
MSmits: as opposed to sampling it or whatnot
reCurse: All of a sudden epoch means very different things
reCurse: Compare the result of an epoch? What?
reCurse: You didn't even do the same number of updates
MSmits: sure, but would learning step not have the same problem? I mean you define it as one backpropagation update, but it also matters how much data is in there
MSmits: i meaning training step
reCurse: No that's batch size
reCurse: That's constant
MSmits: allright, so if you use batch size and training step together, they make a well defined combo
reCurse: Yes
reCurse: Well
MSmits: sure, I'll accept that
reCurse: You forgot data reuse
MSmits: what do you mean
reCurse: How many times a data has been sampled over training
reCurse: Leads to overfit typically
MSmits: hmm, why would someone do that on purpose
reCurse: There's a sweet spot for extracting the maximum usefulness of a piece of data
reCurse: Too little and you're wasteful, too much and you overfit
MSmits: yeah that occurred to me
MSmits: especially if you sample
LuisAFK: how can i open a private chat I accidentally closed?
reCurse: So what I meant is
MSmits: how much do you use the same sample from a bigger set, before taking a new sample
reCurse: Batch size 256, 100 training steps
reCurse: Ok, what if your dataset is 1k vs 10k
reCurse: Can have an impact
reCurse: If you dataset is 1M vs 10M you can probably not care
MSmits: right
reCurse: So those three together make a very stable definition in my mind
reCurse: Disclaimer: Don't forget I'm a clown, not an actual scientist
MSmits: :clown:
MSmits: the only reason you're not "officially" a scientist is because you dont publish.
MSmits: you know how much crap is published
reCurse: No the actual reason is I didn't take any formal education on that and wouldn't dare to pretend to hold my candle to people actually working the field
reCurse: Don't mistake me for a teacher
MSmits: part of this is imposter effect. I think there are people working in that field that know less than you do. Not everyone is a star in their field :P
reCurse: I'm not being cute I actually mean it
MSmits: I know
MSmits: I thought you did a formal computer science education?
reCurse: CS sure
reCurse: ML, stats, no
MSmits: ahh ok
MSmits: I wonder how often it happens that competitive coders have an idea that is an improvement on existing science, then not share it for competitive reasons
MSmits: it must happen at least sometimes
reCurse: I sometimes get the impression some papers on AI are done on settings that would be equivalent to a gold league bot in terms of performance and whatnot.
MSmits: definitely that
reCurse: Or comparing MCTS results with 100 iterations... what?
MSmits: you can use the ideas, but they are poorly implemented
MSmits: yeah, thats also what i meant by star in there field. Sometimes someone is just going for their PHD with barely any experience and they do actual science, but not very well
MSmits: their field
reCurse: If you go the empirical route you need to at least be in competitive terms
MSmits: agreed
reCurse: Otherwise what credibility does the result have
reCurse: But a lot of time there's not much reference
reCurse: That's why CG is a goldmine in that regard
kovi: i agree. and its a weird contradiction if you consider chess/bitboarding
MSmits: it's partially because knowing a lot about theory of AI, doesn't make you able to code a bitboard using simd
MSmits: or use anything other than python
sprkrd: scientist are not necessarily good coders, I'd say gold league bots enjoy micro-optimizations (e.g. like bitboards) that paper authors don't care to implement
reCurse: kovi: contradiction?
reCurse: Yeah but if you compare your approach empirically using subpar implementations, how am I supposed to believe the results?
sprkrd: and I would say that is fine as long as all the compared alternatives are using the same subpar environment implementations
MSmits: sprkrd i dont entirely agree, for example, minimax works better than mcts with low performance
MSmits: with higher performance, mcts starts to take over
MSmits: depending on the game ofc
MSmits: (and eval quality)
sprkrd: i was thinking more of comparing apples to apples
sprkrd: MCTS variants among them, for instance
kovi: chat slow/dead?
reCurse: Chat is fine on my end
kovi: meh, dice duel
emh: I was thinking about making an educational video on bitboarding. anyone have experience using manim? I tried a bit with manim (software behind 3 blue 1 brown), but now I'm thinking js+html might be easier
MSmits: sprkrd, some mcts variants might work better in one amount of calculation time and then if you add more, the other variant could be better
MSmits: whats manim?
sprkrd: Most papers I've seen about MCTS don't use calculation time as a meaningful metric
sprkrd: they use amount of simulations
kovi: contradiction: for mcts/whatever cg top >= top science for chess bitboard you mentioned that way are beginngers
MSmits: One of these days I am going to show some ways to bitboard with UTTT in a tech.io article I think. Just because thats where most players start
sprkrd: whatever time it takes to reach that many simulations
reCurse: kovi: I don't think I communicated properly again. I meant 'some' papers are way subpar or are working in games with no good reference.
emh: MSmits manim is mathematical animation software
reCurse: In no way did I mean 'top science'
emh: https://github.com/3b1b/manim
MSmits: sprkrd thats better indeed. But even then, you also have to use examples with high rollout counts (> 1M )
reCurse: Chess is also different because it's actually a game that received an enormous amount of competitive attention
kovi: true, without reference platform/game hard to assess value
MSmits: and sometimes it's an unfair comparison. 1 rollout could be more expensive, because mcts variant is heavier, compared to the other one
kovi: yes, that is we are getting better...we have reference platform and pet multis
reCurse: If you compare your approach and you mention in experiment settings '100 iterations for MCTS' I'm closing the tab
reCurse: Stuff like that
MSmits: yeah, but 1k, 10k, 100k, 1M would be ok
MSmits: 1k is not bad if you have very small branching
reCurse: You get the idea
MSmits: ye
sprkrd: Maybe they're 100 high quality iterations :)
sprkrd: Maybe the default policy is a gold league bot on its own
MSmits: even so, you lose too much to exploration
MSmits: mcts just doesnt work with that low iterations
MSmits: minimax always works
Nerchio: offtopic but is there any place where you can check how many legend/gold bots you have
MSmits: yeah CG multi
Nerchio: danke MSmits
MSmits: https://cgmulti.azke.fr/players?p=Nerchio
MSmits: nice job on ooc Nerchio :)
sprkrd: Anywho, I haven't seen any paper so daring as to propose experiments with 100 MCTS iteration :joy:
reCurse: I did more than once :/
reCurse: Sometimes less
sprkrd: That's certainly ridiculous
MSmits: I've seen some crap ones. Not sure if it was 100, but it was bad
sprkrd: Accepted in a reputable conference/journal?
sprkrd: Or more like school work?
reCurse: I don't pay attention to that, maybe I should
MSmits: conference/journal. I dont know their reputations
reCurse: But maybe I would miss a good idea
reCurse: I still appreciate learning from different ideas/approaches, but at some point I'll filter on bad experiments
MSmits: I like the winands ones.
MSmits: not all good, but mostly good reads
reCurse: winands are very good
MSmits: the pseudo code is a bit crappy :P
sprkrd: Sure, you shouldn't skip paper based on the reputation of the venue, but it's certainly more likely to find bad papers on arxiv than on AAAI
MSmits: but at least they give code :)
reCurse: I don't know, the vast majority of my reading is on arxiv
reCurse: Found very very good stuff there
reCurse: I'd rather not pay publishers who exploit researchers
sprkrd: Mind you, some of the good stuff on arxiv have actually been published somewhere else, and they're on arxiv so the paper is publicly available
sprkrd: While there are certainly ethic issues in academia, peer review is almost always a good thing
reCurse: The gold nugget is finding the paper review on openreview
reCurse: If all papers had that it would be amazing
sprkrd: Shameless (and off-topic) plug: https://www.youtube.com/watch?v=8n8mF-RmsNs (paper available on description :))
reCurse: Frame rate makes it a bit difficult to judge
sprkrd: the good stuff is the bloopers at the end
reCurse: It's cool though
sprkrd: computer was running like lava while I was recording this
sprkrd: I'm thankful to have footage at all
reCurse: Hehe yeah, it's just animation at 10fps is difficult to see
kovi: no flying or underground?
sprkrd: crawling :)
sprkrd: flying and underground I fixed at the very beginning
sprkrd: Too many flying darwins
kovi: sommersault or cartwheel
kovi: ai finding the holes or bending the rules is fun
sprkrd: Oh, when I ran the thing on the real robot, instead of walking forward, it did the moonwalking
reCurse: Nice
sprkrd: Didn't have time to fix that, but it was pretty funny
sprkrd: https://www.youtube.com/watch?v=lOaWvOA9cb4
reCurse: Cool robot
jacek: oh my
StevensGino: isn't it from 2017?
CamTheHelpDesk: since the puzzle this week is bot programming, what counts as solving it for the homepage path?
RoboStac: getting out of the initial league
LuisAFK: https://www.codingame.com/ide/puzzle/hidden-messages-in-images
Shelby: http://chat.codingame.com/pastebin/cf7931f2-0aff-46e6-ba67-9423d51c9574
Shelby: ... That's frustrating...
Shelby: I was talking about how it sucks to get thrown into Bronze when I've never tested my code in Wood 1.
Shelby: Or even got to read any of the Wood 1 instructions.
Shelby: Happens almost every contest I join.
reCurse: You should just be happy to be out of the woods imo
reCurse: Focus on the real game rather than made up ones
Westicles: This is what the kids call a humble brag
Shelby: Yeah, but I don't get a chance to look at any of the rules introduced in Wood 1, and I don't get to play with it.
reCurse: I guess, but you usually don't get much usefulness out of it once you promote still
reCurse: In my experience anyway
FrancoRoura: All games are made up in the end, I also think it'd be fun to have one more game to play with
Shelby: I've never breached past Bronze (at least not that I know of).
reCurse: Sure, I meant 'made up' in the sense that you made the actual one, then you just sort of butcher it in an attempt to be more beginner friendly
Shelby: I always get stuck when I don't get to do anything with Wood 1.
reCurse: Here's chess, oh but we're going to start with only pawns
jacek: shameless advertising
olaf_surgut: does making some kind of skip lists in CvZ for dead humans/zombies makes sensible speedup
olaf_surgut: ?
jacek: but that could also mean breakthrough advertising :thinking:
reCurse: All intended :P
Shelby: I don't even know what rules are added between Wood 2 and Wood 1, versus if any rules are added between Wood 1 and Bronze.
jacek: check out the referee code
Westicles: Shelby, part of it is the superior education system in the US. Whatever code you throw together will go far
Shelby: lmao
Shelby: Superior education system = my mother
Shelby: She didn't teach me everything I know. Not by a longshot. But she did teach me the basics of coding logic when I was a very small child, and she taught me how to google as I grew up.
reCurse: Taught you how to google? You're already ahead the average
Shelby: She went to online university for web design when I was 8, so I got a major dose of college education while in elementary school.
jacek: she taught you javascript? thats child abuse!
Shelby: Actually, the vbScript with classic ASP might be a lot closer to child abuse. That code is nasty.
Shelby: note: I do _not_ mean vb.net or asp.net.
reCurse: I don't know, people still code in PHP
reCurse: Isn't that the same thing? :P
Shelby: Javascript and PHP 5 are pretty similar. PHP does a weird thing where you have to explicitly pass it by reference, or it copies the values. And you have to prefix variables. Besides that, the syntax and data types are pretty similar. It was a shock to go from mostly JS and PHP over to C# and Java, where data maps aren't first-class.
reCurse: I was comparing PHP to classical ASP
Shelby: http://chat.codingame.com/pastebin/fb783e1f-f530-47b4-99df-50d504fc3f18
Shelby: ...
jacek: oO
Shelby: C# is super nice. PHP is okay. Java is torture. vbScript... you might as well be using Bash for your web server.
Nerchio: you are torture
Nerchio: :grin:
Scarfield: "no offence" :p
Xzoky174: I'm stuck on level 6 for so long..
jacek: :no_mouth:
ErrorCookie: I am doing a challenge where I need to find exits of a maze. I came up with a recursive function that calls itself for every possible direction and from those possible directions again for each possible directions and so on. Is there a "better" alogirthm for this?
ErrorCookie: *algorithm
jacek: DFS?
ErrorCookie: Ok thanks .(^_^).
jacek: what puzzle
ErrorCookie: Medium: Maze
jacek: so yeah, DFS and/or BFS would be good
jacek: Space maze. Community success rate: 5% :thinking:
Scarfield: Member of the 5% ?
sprkrd: Have 20 people tried and one of them is the author?
Jaminrock: the hypnotoad himself
Westicles: I highly recommend playing space maze manually. Much more fun than writing some boring program
jacek: do you do that with csb as well
Westicles: :thinking:
Gers2017: bbbbbbb
sprkrd: "some boring program"
sprkrd: better write a fun program, no?
jacek: functional?
Psuedo: Is it normal to struggle with these problems but in projects I rarely find myself struggling?
Psuedo: I think a lot of what I'm facing is trouble understanding the problem, as well.
jacek: puzzles here or most stuff here are unlike anything i do at work
AllYourTrees: are ppl testing out NNs in the connect 4 thing?
jacek: dunno, i dont. i use good old N-tuple
AllYourTrees: what's N-tuple?
AntiSquid: the *thing*
jacek: patterns
jacek: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.108.5111&rep=rep1&type=pdf
YoloTheBear: This 1D spreadsheet my code keeps failing on the last test Deep Birecursion idk how I could make recursion more optimized it just times out
therealbeef: timeouts are sometimes causes by crashes
therealbeef: s=d
DUNKEN: Has anyone tried this out:
DUNKEN: https://www.codingame.com/multiplayer/bot-programming/connect-4
Bob23: hello world!
Smelty: :eyes:
StepBack13: DUNKEN, trying it now. diagonals are so hard to stop!
JimmyJams: reverse code clashes should validate against the hidden tests before submitting. Otherwise we have no idea if our solution is the right one.
JimmyJams: they don't need to show the hidden ones, just tell us if they passed or failed
Smelty: hmm
Slavvy: bro how do I continue
TNtube: JimmyJams i approuve this
TNtube: it's really frustrating when all your tests passes but finally the answer is totally an other algo
Smelty: yep i agree
Smelty: might want to just add more test cases for reverse mode though