examples/07_tutorials/KDD2020-tutorial/README.md
The following setup instructions assume users work in a Linux system. The testing was performed on a Ubuntu Linux system. We use uv to install packages and manage the virtual environment. To install uv, run:
curl -LsSf https://astral.sh/uv/install.sh | sh
Clone the repository
git clone https://github.com/recommenders-team/recommenders
Navigate to the tutorial folder. The materials for the tutorial are located under the directory of recommenders/examples/07_tutorials/KDD2020-tutorial.
cd recommenders/examples/07_tutorials/KDD2020-tutorial
Download the dataset
Download the dataset for hands on experiments and unzip to data_folder:
wget https://recodatasets.z20.web.core.windows.net/kdd2020/data_folder.zip
unzip data_folder.zip -d data_folder
After you unzip the file, there are two folders under data_folder, i.e. 'raw' and 'my_cached'. 'raw' folder contains original txt files from the COVID MAG dataset. 'my_cached' folder contains processed data files, if you miss some steps during the hands-on tutorial, you can make it up by copying corresponding files into experiment folders.
Install the dependencies
The model pre-training will use a tool for converting the original data into embeddings. Use of the tool will require g++. The following installs g++ on a Linux system.
sudo apt-get install g++
The Python script will be run in a virtual environment where the dependencies are installed. Create and activate the environment:
uv venv ~/.venvs/kdd_tutorial_2020 --python 3.11
source ~/.venvs/kdd_tutorial_2020/bin/activate
uv pip install -r requirements_kdd.txt
Note: If requirements_kdd.txt doesn't exist, you can install the dependencies manually:
uv pip install numpy pandas jupyter ipykernel scikit-learn matplotlib scipy pytest numba tensorflow
The tutorial will be conducted by using the Jupyter notebooks. The newly created kernel can be registered with the Jupyter notebook server
python -m ipykernel install --user --name kdd_tutorial_2020 --display-name "Python (kdd tutorial)"
After the setup, the users should be able to launch the notebooks locally with the command
jupyter notebook --port=8080
Then the notebook can be spinned off in a browser at the address of localhost:8080.
Alternatively, if the jupyter notebook server is on a remote server, the users can launch the jupyter notebook by using the following command.
jupyter notebook --no-browser --ip=10.214.70.89 --port=8080
From the local browser, the notebook can be spinned off at the address of 10.214.70.89:8080.