Back to Recommenders

Environment setup

examples/07_tutorials/KDD2020-tutorial/README.md

1.2.12.8 KB
Original Source

Environment setup

The following setup instructions assume users work in a Linux system. The testing was performed on a Ubuntu Linux system. We use uv to install packages and manage the virtual environment. To install uv, run:

bash
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Clone the repository

    bash
    git clone https://github.com/recommenders-team/recommenders
    
  2. Navigate to the tutorial folder. The materials for the tutorial are located under the directory of recommenders/examples/07_tutorials/KDD2020-tutorial.

    bash
    cd recommenders/examples/07_tutorials/KDD2020-tutorial
    
  3. Download the dataset

    Download the dataset for hands on experiments and unzip to data_folder:

    bash
    wget https://recodatasets.z20.web.core.windows.net/kdd2020/data_folder.zip
    unzip data_folder.zip -d data_folder
    

    After you unzip the file, there are two folders under data_folder, i.e. 'raw' and 'my_cached'. 'raw' folder contains original txt files from the COVID MAG dataset. 'my_cached' folder contains processed data files, if you miss some steps during the hands-on tutorial, you can make it up by copying corresponding files into experiment folders.

  4. Install the dependencies

    1. The model pre-training will use a tool for converting the original data into embeddings. Use of the tool will require g++. The following installs g++ on a Linux system.

      bash
      sudo apt-get install g++
      
    2. The Python script will be run in a virtual environment where the dependencies are installed. Create and activate the environment:

      bash
      uv venv ~/.venvs/kdd_tutorial_2020 --python 3.11
      source ~/.venvs/kdd_tutorial_2020/bin/activate
      uv pip install -r requirements_kdd.txt
      

      Note: If requirements_kdd.txt doesn't exist, you can install the dependencies manually:

      bash
      uv pip install numpy pandas jupyter ipykernel scikit-learn matplotlib scipy pytest numba tensorflow
      
  5. The tutorial will be conducted by using the Jupyter notebooks. The newly created kernel can be registered with the Jupyter notebook server

    bash
    python -m ipykernel install --user --name kdd_tutorial_2020 --display-name "Python (kdd tutorial)"
    

Tutorial notebooks/scripts

After the setup, the users should be able to launch the notebooks locally with the command

bash
jupyter notebook --port=8080

Then the notebook can be spinned off in a browser at the address of localhost:8080. Alternatively, if the jupyter notebook server is on a remote server, the users can launch the jupyter notebook by using the following command.

bash
jupyter notebook --no-browser --ip=10.214.70.89 --port=8080

From the local browser, the notebook can be spinned off at the address of 10.214.70.89:8080.