Institutional Platform for Secure Self-Service Large Language Model Exploration
Abstract: This paper introduces a user-friendly platform developed by the University of Kentucky Center for Applied AI, designed to make large, customized LLMs more accessible. By capitalizing on recent advancements in multi-LoRA inference, the system efficiently accommodates custom adapters for a diverse range of users and projects. The paper outlines the system's architecture and key features, encompassing dataset curation, model training, secure inference, and text-based feature extraction. We illustrate the establishment of a tenant-aware computational network using agent-based methods, securely utilizing islands of isolated resources as a unified system. The platform strives to deliver secure LLM services, emphasizing process and data isolation, end-to-end encryption, and role-based resource authentication. This contribution aligns with the overarching goal of enabling simplified access to cutting-edge AI models and technology in support of scientific discovery.
- OpenAI, “Chatgpt,” https://chat.openai.com, 2023, accessed: 2023-07-30.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695.
- J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
- J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. Elnashar, J. Spencer-Smith, and D. C. Schmidt, “A prompt pattern catalog to enhance prompt engineering with chatgpt,” arXiv preprint arXiv:2302.11382, 2023.
- H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
- H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- “Llm explorer,” https://llm.extractum.io/, 2024, accessed: 2024-01-11.
- K. Zhou, Y. Zhu, Z. Chen, W. Chen, W. X. Zhao, X. Chen, Y. Lin, J.-R. Wen, and J. Han, “Don’t make your llm an evaluation benchmark cheater,” arXiv preprint arXiv:2311.01964, 2023.
- “Hugging face,” https://huggingface.co/, 2024, accessed: 2024-01-11.
- T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz et al., “Huggingface’s transformers: State-of-the-art natural language processing,” arXiv preprint arXiv:1910.03771, 2019.
- “Lmsys: Large model systems organization,” https://github.com/lm-sys/FastChat, 2024.
- “Openaccess ai collective: Axolotl,” https://github.com/OpenAccess-AI-Collective/axolotl, 2024.
- P. Das, N. Ivkin, T. Bansal, L. Rouesnel, P. Gautier, Z. Karnin, L. Dirac, L. Ramakrishnan, A. Perunicic, I. Shcherbatyi et al., “Amazon sagemaker autopilot: a white box automl solution at scale,” in Proceedings of the fourth international workshop on data management for end-to-end machine learning, 2020, pp. 1–7.
- “Azure machine learning,” https://azure.microsoft.com/en-us/products/machine-learning, 2024.
- “vllm,” https://github.com/vllm-project/vllm, 2024.
- “Langchain,” https://github.com/langchain-ai/langchain, 2024, accessed: 2024-01-13.
- “Azure openai service,” https://azure.microsoft.com/en-us/products/ai-services/openai-service, 2024.
- “Lmsys chatbot arena leaderboard,” https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard, 2024.
- B. Gunel, J. Du, A. Conneau, and V. Stoyanov, “Supervised contrastive learning for pre-trained language model fine-tuning,” arXiv preprint arXiv:2011.01403, 2020.
- M. Xia, T. Gao, Z. Zeng, and D. Chen, “Sheared llama: Accelerating language model pre-training via structured pruning,” arXiv preprint arXiv:2310.06694, 2023.
- M. Kwon, S. M. Xie, K. Bullard, and D. Sadigh, “Reward design with language models,” arXiv preprint arXiv:2303.00001, 2023.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” arXiv preprint arXiv:2305.18290, 2023.
- X. Liu, H. Yan, S. Zhang, C. An, X. Qiu, and D. Lin, “Scaling laws of rope-based extrapolation,” arXiv preprint arXiv:2310.05209, 2023.
- T. Dao, “Flashattention-2: Faster attention with better parallelism and work partitioning,” arXiv preprint arXiv:2307.08691, 2023.
- E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
- E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh, “Gptq: Accurate post-training quantization for generative pre-trained transformers,” arXiv preprint arXiv:2210.17323, 2022.
- “Ggml,” https://github.com/ggerganov/ggml, 2023.
- T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “Qlora: Efficient finetuning of quantized llms,” arXiv preprint arXiv:2305.14314, 2023.
- J. Lin, J. Tang, H. Tang, S. Yang, X. Dang, and S. Han, “Awq: Activation-aware weight quantization for llm compression and acceleration,” arXiv preprint arXiv:2306.00978, 2023.
- “Nvidia tensorrt-llm,” https://github.com/NVIDIA/TensorRT-LLM, 2024, accessed: 2024-01-13.
- L. Chen, Z. Ye, Y. Wu, D. Zhuo, L. Ceze, and A. Krishnamurthy, “Punica: Multi-tenant lora serving,” arXiv preprint arXiv:2310.18547, 2023.
- Y. Sheng, S. Cao, D. Li, C. Hooper, N. Lee, S. Yang, C. Chou, B. Zhu, L. Zheng, K. Keutzer et al., “S-lora: Serving thousands of concurrent lora adapters,” arXiv preprint arXiv:2311.03285, 2023.
- M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne et al., “The fair guiding principles for scientific data management and stewardship,” Scientific data, vol. 3, no. 1, pp. 1–9, 2016.
- P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel et al., “Retrieval-augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems, vol. 33, pp. 9459–9474, 2020.
- K. Tirumala, D. Simig, A. Aghajanyan, and A. S. Morcos, “D4: Improving llm pretraining via document de-duplication and diversification,” arXiv preprint arXiv:2308.12284, 2023.
- R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto, “Alpaca: A strong, replicable instruction-following model,” Stanford Center for Research on Foundation Models. https://crfm. stanford. edu/2023/03/13/alpaca. html, vol. 3, no. 6, p. 7, 2023.
- “Sharegpt,” https://sharegpt.com/, 2024.
- “Clearml,” https://clear.ml/, 2024.
- M. R. Palankar, A. Iamnitchi, M. Ripeanu, and S. Garfinkel, “Amazon s3 for science grids: a viable solution?” in Proceedings of the 2008 international workshop on Data-aware distributed computing, 2008, pp. 55–64.
- “Lorax,” https://github.com/predibase/lorax, 2024.
- “Lorax supported models,” https://predibase.github.io/lorax/models/base_models/, 2024.
- “Lorax supported adapters,” https://predibase.github.io/lorax/models/adapters/, 2024.
- N. Hyeon-Woo, M. Ye-Bin, and T.-H. Oh, “Fedpara: Low-rank hadamard product for communication-efficient federated learning,” arXiv preprint arXiv:2108.06098, 2021.
- A. Edalati, M. Tahaei, I. Kobyzev, V. P. Nia, J. J. Clark, and M. Rezagholizadeh, “Krona: Parameter efficient tuning with kronecker adapter,” arXiv preprint arXiv:2212.10650, 2022.
- Y. Li, Y. Yu, C. Liang, P. He, N. Karampatziakis, W. Chen, and T. Zhao, “Loftq: Lora-fine-tuning-aware quantization for large language models,” arXiv preprint arXiv:2310.08659, 2023.
- “Lm studion,” https://lmstudio.ai, 2024, accessed: 2024-01-31.
- D. Markowitz, “Gptmeet ai’s multitool: Vector embeddings,” Google Cloud Blog, 2022.
- V. C. Bumgardner, V. W. Marek, and C. D. Hickey, “Cresco: A distributed agent-based edge computing framework,” in 2016 12th International Conference on Network and Service Management (CNSM). IEEE, 2016, pp. 400–405.
- V. C. Bumgardner, C. Hickey, and V. W. Marek, “An edge-focused model for distributed streaming data applications,” in 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 2018, pp. 657–662.
- C. Bumgardner, C. Hickey, and N. Seyedtalebi, “Agent communications in edge computing,” in 2019 IEEE International Conference on Industrial Internet (ICII). IEEE, 2019, pp. 387–392.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.