-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClamScan should use local static analysis + sandboxes + artificial CNS (central nervous systems) to secure computers. #1206
Comments
This comment was marked as duplicate.
This comment was marked as duplicate.
Thanks for the... interesting suggestion. This approach does not seem workable for a number of reasons, the least of which is the apparent lack of a coherent suggestion and workable implementation plan. Since you're obviously a fan of "AI" I've asked Gemini to assist in drafting the remainder of my response: Resource Challenges:
False Positive Issues:
Current Methods Work Well:
Alternative Solutions:
|
Do not trust AI; AI is just sin, is not an artificial CNS. Resources: This post suggests to produce artificial CNS, and shows you FLOSS resources of artificial CNS (such as APXR and HSOM) that have examples of how to setup for us. This post also suggests uses of heuristical analysis plus sandboxes, and links to resources (such as Virustotal/Zenbox) that do so for us. Current methods: Other researchers would not have begun to produce new methods if the old methods are good enough for us. How this affects us: Safety concerns are the main reason that autonomous robots do not work outdoors to mass produce structures such as houses to us. |
It's clear that you don't have the depth to engage on this topic. Artificial Neural Networks (ANNs) aren't exactly the same as a human brain (CNS). However, ANNs are inspired by the structure and function of the brain and fall under the broad umbrella of Artificial Intelligence (AI). AI encompasses various approaches to mimicking human intelligence, and ANNs are one specific technique.
You know what already uses herustics? ClamAV! https://blog.clamav.net/2011/03/top-5-misconceptions-about-clamav.html I'll also note quickly that the blog post also indicates that the ClamAV team use sandboxes, though perhaps not in the automated way that you're envisioning (some sort of honeypot perhaps?)
It is clear that you do not understand how antiviruses and endpoint protection services work. It is uncommon to 'undo the infection' (i.e. clean infected files), instead these tools focus on preventing the exploitation of a device by preventing the execution of "bad" code on an endpoint (and detecting and quarantining infected files).
|
Gemini is not able to follow links or parse sources. Lots of antiviruses are able to undo infection from programs, Was stupid to not have found those pages about how ClamAV/ClamScan uses some heuristical analysis, |
I agree with the sentiment of your request. It is a good request to investigate AI / ML to identify malware. Just last week, the Snort team released SnortML, which is a module for Snort that may load ML models to classify HTTP URI inputs to identify zero day attacks: https://blog.snort.org/2024/03/talos-launching-new-machine-learning.html It would be wonderful to add detection capabilities to ClamAV. It seems like a promising research area for folks interested in malware research. |
Updated original post (English fixes, + extra examples/sources) |
This is too large of a request. If you want to make such a thing, we could possibly accept a pull request with this kind of feature added. It is also probably too resource intensive to run on the devices that ClamAV uses. |
Is fast with caches. To train (produce synaptic weights for) the CNS, is slow plus requires access to huge sample databases, |
This comment was marked as duplicate.
This comment was marked as duplicate.
Artifiicial central nervous system's |
This comment was marked as duplicate.
This comment was marked as duplicate.
Original post was pseudocode, is now C++. |
Original post has new fixes. Comments have new fixes. |
@ETERNALBLUEbullrun The concepts you're discussing is so much outside my wheelhouse it mostly sounds like ChatGPT make up some tech jargon. The code you shared isn't what I would call C++. It's just C++ wrapping around Python code. Sorry, we're not interested. |
SwuduSusuwu/SubStack#6 " Was that the sole concern? With C++ implementation of |
Last post before this ( #1206 (comment) ) was about how to produce virus signatures (which is just one submodule of this issue). Is that what you are referring to? Am curious: what can you ask ChatGPT which has a chance to produce this? Which part confused you? Was it the part about how formulas to compress data (lossless) with codebooks, are close to formulas to produce virus signatures? Formulas such as
This is not a concept, executable code exists. |
Update Oct21: use
|
Was the confusion from the original post's For comparison; |
Table of Contents ["Allows all uses"]
Intro
Static analysis + sandbox + CNS = 1 second (approx) analysis of new executables (secures all app launches,) but after first launch: caches reduce this to less than 1ms (just cost to compute
caches.at(classSha2(FileBytecode()))
, wherecaches
isstd::map<ResultListHash, VirusAnalysisResult>
orResultList::hashes
).../README.md
has how to use this (what follows is more of a book of source code).(Removed duplicate licenses,
#if
guards,#include
s,namespace
s,NOLINTBEGIN
s,NOLINTEND
s from all exceptmain.hxx
; follow URLs for whole sources.)[Version of post is
@cxx/Macros.hxx:SUSUWU_PRINT()
; remove shorthand @6773ff3 ] For the most new sources (+ static libs), use apps such as iSH (for iOS) or Termux (for Android OS) to run this:git clone https://github.com/SwuduSusuwu/SubStack.git && cd ./Substack/ && ./build.sh
To test certificates, view this post.
To improve how fast the whole program executes;
CXXFLAGS
should include auto-vectorizes/auto-parallelizes. 1To improve how fast backpropagation (
Cns::setupSynapses()
, which {produceAnalysisCns()
,produceVirusFixCns()
} use) executes, implementclass Cns
with TensorFlow'sMapReduce
. 2Source code
(C) 2024 Swudu Susuwu, dual licenses: choose GPLv2 or Apache 2, allows all uses.
less
cxx/Macros.hxx #Removed: disabled color codes + unused OSC codesless
cxx/Macros.cxxless
cxx/ClassObject.hxxless
cxx/ClassObject.cxx #This is just unit tests.ClassObject.hxx
has all which has actual use.less
cxx/ClassPortableExecutable.hxxless
cxx/ClassSys.hxxless
cxx/ClassSys.cxxless
cxx/ClassSha2.hxxless
cxx/ClassSha2.cxxless
cxx/ClassResultList.hxxless
cxx/ClassResultList.cxxless
cxx/ClassCns.hxxless
cxx/ClassCns.cxxless
cxx/VirusAnalysis.hxxless
cxx/VirusAnalysis.cxxless
cxx/main.hxx #With boilerplateless
cxx/main.cxxComparison to assistants
For comparison;
produceVirusFixCns()
is close to assistants (such as OpenLM Research's "OpenLLaMA" or Anthropic's "Assistant";) have such demo asproduceAssistantCns()
;less
cxx/AssistantCns.hxxless
cxx/AssistantCns.cxxPost, with resources
Hash resources:
Hash is just a checksum (such as Sha-2) of all sample inputs, which maps to "this passes" (or "this does not pass".)
Signature resources:
Signature is just a substring (or regular expression) specific to infections, which the virus analysis tool searches all executables for; if the signature is found in the executable, do not allow to launch, otherwise launch this.
Static analysis resources:
https://github.com/topics/analysis has lots of open source (FLOSS) app/SW (executable) analysis tools (such as https://github.com/kylefarris/clamscan, which wraps https://github.com/Cisco-Talos/clamav/,)
which show how to process raw executables (or their disassembled sources) to deduce what those do to your OS.
Most static analysis (such as Clang/LLVM has) just checks programs for accidental issues (such as buffer overflows, underruns, or null pointer dereferences,) but has uses as a basis for virus analysis;
you can expand such so that checks for deliberate vulnerabilities/signs of infection (these are heuristics, so the user should have a choice to isolate and submit for review, or continue launch of this) are included.
clang/lib/StaticAnalyzer
is part of Clang/LLVM (license is FLOSS,) does static analysis (emulation produces inputs to functions, formulas do analysis of stacktraces (+ heap/stack uses) to produce lists of possible unwanted side effects to warn you of).
Versus instrumentation such as
-fsanitize
, you do not have to recompile to do static analysis.-fsanitize
requires you to produce inputs, static analysis tests all/most possible inputs for you.LLVM/Clang is lots of files; you can clone Phasar if you want just it’s static analysis.
Example outputs (tests “Fdroid.apk”), of VirusTotal's static analysis + 2 sandboxes;
the false positive outputs (from VirusTotal's Zenbox) show the purpose of manual review.
Sandbox resources:
As opposed to static analysis of the executables hex (or disassembled sources,)
sandboxes perform chroot + functional analysis.
Valgrind is just meant to locate accidental security vulnerabilities, but is a common example of functional analysis.
If compliant to POSIX (each Linux OS is), tools can use:
chroot()
(runman chroot
for instructions) so that executables sent to analysis are restricted to the path of analysis;strace()
(runman strace
for instructions, or view opensource.com's or geeksforgeeks.org's examples) which hooks all system calls (to store logs for functional analysis).Old fashioned sandboxes just test executables with
chroot()
plusstrace()
for a few seconds,with all outputs from
strace()
sent to manual reviews;new sandboxes produce inputs (with the goal to act as a normal user) to send to those executables,
new sandboxes use heuristics to guess which outputs from the executable (or from
strace()
) to send to reviews (so manual reviews have less to do); for example, repetitious accesses to resources which the executable produced on its own are ignored (or are counted as one access to such resources).Autonomous sandboxes (such as Virustotal's) use full outputs from all analyses,
with calculus to guess if the executable is good for use (thousands of rules such as "Should not alter files of other programs unless prompted to through OS dialogs", "Should not perform network access unless prompted to from you", "Should not perform actions leading to obfuscation which could hinder analysis",)
which, if violated, add to the executables "danger score" (which the analysis results page shows you.)
Neural resources
CNS resources:
Once the virus analysis tool has static+functional analysis (+ sandbox,) the next logical move is to do artificial CNS.
Just as (if humans grew trillions of neurons plus thousands of layers of cortices) one of us could parse all databases of infections (plus samples of fresh executables) to setup our synapses to parse hex dumps of executables (to allow us to revert all infections to fresh executables, or if the whole thing is an infection just block,)
so too could artificial CNS (with trillions of artificial neurons) do this:
Github has lots of FLOSS simulators of neural tissue which have uses to program virus analysis tools (or assistants such as ChatGPT 4.0 or Claude-3 Opus,) but not sufficient to house human consciousness:
git clone https://github.com/CarsonScott/HSOM.git
, license is FLOSS) is a simple Python neural map../src/examples/
has examples of howto setup as artificial CNS.Simple to setup once you have relevant databases downloaded.
git clone https://github.com/Rober-t/apxr_run/.git
, license is FLOSS) is almost complex enough to house human consciousness;./src/lib/functions.erl
has various FLOSS neural network activation functions (absolute, average, standard deviation, sqrt, sin, tanh, log, sigmoid, cos), plus sensor functions (vector difference, quadratic, multiquadric, saturation [D-zone], gaussian, cartesian/planar/polar distances)../src/lib/plasticity.erl
has various FLOSS neuroplastic functions (self-modulation, Hebbian function, Oja's function)../src/agent_mgr/signal_aggregator.erl
has various FLOSS neural network input aggregator functions (dot products, product of differences, mult products)../src/lib/tuning_selection.erl
has various simulated-annealing functions for artificial neural networks (dynamic [random], active [random], current [random], all [random])../src/agent_mgr/neuron.erl
has choices to evolve connections through Darwinian or Lamarkian formulas../examples/
has examples of howto setup as artificial CNS.Simple to convert Erlang functions to Java/C++ (to reuse for fast programs); the dynamic-typed, functional, concurrent syntax is close to Lisp's.) Fortran to Erlang converters exist (plus Erlang has a Fortran frontend), which you can peruse for clues on how to convert Erlang to C++.
Synopsis + related posts
This post was about general methods to produce virus analysis tools, which do not require that local resources do all of this;
Alternative CNS structure: based on albatross (includes numerous resources, about all sorts of natural/artificial neural tissue).
How to reproduce the problem
Scan new executables (that are not part of stock databases)
Footnotes
How to improve performance of compute. ↩
The text was updated successfully, but these errors were encountered: