𞋴𝛂𝛋𝛆

  • 0 Posts
  • 4 Comments
Joined 2 years ago
cake
Cake day: June 9th, 2023

help-circle
  • Actually look at the way Discord works in your network, like all the raw IP addresses and and connections with no clear ownership or human readable name, with dozens of changing connections to get any of it to work. Then go try to ask questions about what is going on and who you’re connecting to. Discover that none of it is documented or described anywhere. Then realize that this means no one running Discord is doing so on a fully audited and logged host. You simply cannot be without a bunch of effort. I made it to the 6th layer of whitelisted raw IP addresses, and still nothing worked while trying to connect to Discord in a fully logged and documented network. I am simply unwilling to write a script to annotate that many connections so that all of my logs make sense. I seriously doubt anyone on Discord is doing so, and they certainly lack any understanding of what they are connecting to, why, or the protocols. So the Discord user is telling me “my opsec and privacy awareness is as nonexistent as a pig in a herd running off a cliff, and my system should be assumed compromised with no idea of what might be connected.” Everyone else doing it is a garbage excuse. That no one appears to have gotten hurt – has tissue thin merit, but also reveals that the user runs blind in herds while hoping for the best. Such information infers a lot about a person, their depth, accountability, and ethics – in certain scopes.



  • Makes sense. From the breadboard computer building side, it is easy to build a basic computer with anything like a Z80, 6502, or 68k. It is much more of a pain to interface with video and to a much lesser extent I/O.

    It probably would have been more interesting if the video and I/O were designed as an independent system with a rear inserted SBC-like cartridge that housed the principal processor and memory. Like what if our first consoles were still relevant, and only the controllers had really evolved? Assuming that these could have survived the intermediate era of disc media data density into the era of high density flash memory, we might still be using cartridges. The NVME is basically a game cartridge analog.



  • Anything under 16 is a no go. Your number of CPU cores are important. Use Oobabooga Textgen for an advanced llama.cpp setup that splits between the CPU and GPU. You'll need at least 64 GB of RAM or be willing to offload layers using the NVME with deepspeed. I can run up to a 72b model with 4 bit quantization in GGUF with a 12700 laptop with a mobile 3080Ti which has 16GB of VRAM (mobile is like that).

    I prefer to run a 8×7b mixture of experts model because only 2 of the 8 are ever running at the same time. I am running that in 4 bit quantized GGUF and it takes 56 GB total to load. Once loaded it is about like a 13b model for speed but is ~90% of the capabilities of a 70b. The streaming speed is faster than my fastest reading pace.

    A 70b model streams at my slowest tenable reading pace.

    Both of these options are exponentially more capable than any of the smaller model sizes even if you screw around with training. Unfortunately, this streaming speed is still pretty slow for most advanced agentic stuff. Maybe if I had 24 to 48gb it would be different, I cannot say. If I was building now, I would be looking at what hardware options have the largest L1 cache, the most cores that include the most advanced AVX instructions. Generally, anything with efficiency cores are removing AVX and because the CPU schedulers in kernels are usually unable to handle this asymmetry consumer junk has poor AVX support. It is quite likely that all the problems Intel has had in recent years has been due to how they tried to block consumer stuff from accessing the advanced P-core instructions that were only blocked in microcode. It requires disabling the e-cores or setting up a CPU set isolation in Linux or BSD distros.

    You need good Linux support even if you run windows. Most good and advanced stuff with AI will be done with WSL if you haven’t ditched doz for whatever reason. Use https://linux-hardware.org/ to see support for devices.

    The reason I mentioned avoid consumer e-cores is because there have been some articles popping up lately about all p-core hardware.

    The main constraint for the CPU is the L2 to L1 cache bus width. Researching this deeply may be beneficial.

    Splitting the load between multiple GPUs may be an option too. As of a year ago, the cheapest option for a 16 GB GPU in a machine was a second hand 12th gen Intel laptop with a 3080Ti by a considerable margin when all of it is added up. It is noisy, gets hot, and I hate it many times, wishing I had gotten a server like setup for AI, but I have something and that is what matters.