Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
benrayfield authored Apr 13, 2024
1 parent db29f15 commit 599f05b
Showing 1 changed file with 34 additions and 11 deletions.
45 changes: 34 additions & 11 deletions lib/LeastSquaresLSTMQlearn.js
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Sparse compute state: action[a](invec)->otherInvec transforms for the a possible
Inputs: invec
Outputs: outvec
3 main tools: dagball, axgob, qlearnLSTM. also relearnTk if I can.
3 main tools: dagball, axgob, qlearnLSTM.
This will intheory goviral, Kfactor >> 1, after I demo it for a few things that are useful and fun for me. It wont get nearly as big as a LLM
but it will fill a niche for automation of lots of small tasks. A bunch of them working together in opensource peer to peer thru
browsers might learn to do bigger things together, including using tools the qlearnLSTMs build. So 3 main tools: dagball, axgob,
Expand All @@ -54,7 +54,8 @@ or maybe a feedforward sigmoid layer ins -> sigmoids -> LSTM -> sigmoid -> outs,
model params. ins and outs only happens once. LSTM recurrent happens n times (maybe 10-100). Its a function of
i ins (and model weights, which is in the thousands) to o outs. Use GPU.js or expandedToBigMem(Ap.js)OrTinyGLSL for matmul.
this qlearn object will have state of such weights and a set of trainingData. Each trainingData will
have 1+numPossibleNextStates inVecs. each inVec is whatever the qlearnLSTM would observe at that game state, which may be
have 1+numPossibleNextStates inVecs (UPDATE: just use action(invec) to generate those, dont store them).
each inVec is whatever the qlearnLSTM would observe at that game state, which may be
whole game state (if game is simple) or partial game state (if game is more complex). Usually it will be a partial game state.
Each trainingData will also have a learnRate aka the weight of how much this trainingData should influence learning.
Thats multiplied by globalLearnRate. Normally trainingData learnRate is 1. If trainingData does not exist it is 0.
Expand All @@ -76,12 +77,12 @@ QL = (()=>{
//mostly or compltely immutable, except worldAction may be mutable
//in that it may be a connection to a mutable external world or a deterministic sim.
//actions is a [] list of js functions of Float32Array->Float32Array, same size array each time. Its invec->invec,
//playerActions is a [] list of js functions of Float32Array->Float32Array, same size (this.stateSize) array each time. Its invec->invec,
//to transform game state to next game state, if that action is chosen, such as an action of "move in dimension5 +0.3 amount"
//or "push button B".
//inSize is Float32Array.length and the number of input floats that go into the qlearner neuralnet.
//outSize is 1+actions.length. The first is qscore of the given invec. After that its each action, a transform of invec to invec.
//rewardFunc(invec)->num is the reward in the sum of 1 weight*(reward+.99*max(nextQ) - Q)^2. per trainingData.
//rewardFunc(invec)->num is the reward in the sum of 1 weight*(reward+.99*max(nextQ) - Q)^2 per trainingData.
//
QL.Game = function(inSize, actionFuncs, rewardFunc){
this.inSize = inSize;
Expand All @@ -105,31 +106,53 @@ QL = (()=>{
//given a Float32Array size of game.stateSize, returns a Float32Array size of 1+game.playerActions.
//The 1+ is the neural estimated qscore of that invec. The other game.playerActions are the neural approximated qscores of of each
//of those next states in the possible case of doing those actions even though it wasnt necessarily done here.
QL.predict = function(invec){
QL.LeastSquaresLstmQlearner.prototype.predict = function(invec){
throw new Error('TODO');
};
//returns an integer 0 to game.playerActions.length-1.
QL.chooseAction = function(invec){
QL.LeastSquaresLstmQlearner.prototype.chooseAction = function(invec){
throw new Error('TODO');
};
QL.decayAllTrainingDataWeights = function(mult){

QL.LeastSquaresLstmQlearner.prototype.decayAllTrainingDataWeights = function(mult){
for(var state in this.trainingData){
this.multiplyTrainingDataWeight(state,mult);
}
};
QL.addToTrainingDataWeight = function(state,addToWeight){

QL.LeastSquaresLstmQlearner.prototype.addToTrainingDataWeight = function(state,addToWeight){
this.setTrainingDataWeight(state,this.trainingDataWeight(state)+addToWeight);
};
QL.multiplyTrainingDataWeight = function(state,multiplyWeightBy){

QL.LeastSquaresLstmQlearner.prototype.multiplyTrainingDataWeight = function(state,multiplyWeightBy){
this.setTrainingDataWeight(state,this.trainingDataWeight(state)*multiplyWeightBy);
};
QL.setTrainingDataWeight = function(state,weight){

//weight of 0 removes the trainingData. Any nonzero weight adds it. updates to that weight.
//Used in the loss function of sum of weight*(reward+.99*max(nextQ) - Q)^2 for all trainingData together as experienceReplay.
QL.LeastSquaresLstmQlearner.prototype.setTrainingDataWeight = function(state,weight){
if(weight == 0) delete this.trainingData[state];
else this.trainingData[state] = weight;
};
QL.trainingDataWeight = function(state){

QL.LeastSquaresLstmQlearner.prototype.trainingDataWeight = function(state){
return this.trainingData[state] || 0;
};

//search for exactly equal Float32Arrays by content and sum their weights in this.trainingData
QL.LeastSquaresLstmQlearner.prototype.mergeDupTrainingData = function(){
throw new Error('TODO');
};

//TODO optimize by GPU.js first, then by TinyGLSL.js andOr Ap.js (which uses TinyGLSL.js) which are 2024-4-13 my js GPU libraries (WebGL2_GLSL) but are not scaled up yet
//and only support as much memory as fits in 1 GPU core (up to about 1000 floats, a GLSL limit, despite GPU core could hold maybe 10 times more,
//but shaders are supposed to be small and simpler, even though its not specific to shaders, it compiles to shaders so has that limit,
//so Ap.js and TinyGLSL can have around as much memory as fits in 1 GPU core, copied to all GPU cores at once, plus each core can then use local memory
//differing from the other GPU cores that it derives from that many times copied few kB of memory AND derived from its int thread id similar to in OpenCL.
//
//RecurrentJava is a CPU-only library for training LSTMs (very slowly). I had ported part of it to OpenCL in java but then moved
//on to javascript. Maybe I'll use some of that learning algorithm to train LSTM for least squares neural qlearning in browser?
//https://github.com/evolvingstuff/RecurrentJava/blob/master/src/model/LstmLayer.java

return QL;
})();

0 comments on commit 599f05b

Please sign in to comment.