python3爬虫(6):智能选择优质基金

智能选择优质基金

1. 关于本项目

​ 本项目希望能够通过每天爬取基金数据,通过一些金融知识或者机器学习模型,给出当天优质基金。

2. 基金数据爬取

​ python3运行code中CrawlingFund.py 代码。

# encoding=utf-8
from selenium import webdriver
from bs4 import BeautifulSoup
import os
import time

def extract_url_info(url="https://www.howbuy.com/fund/fundranking/", scroll_times=0):
    
    driver = webdriver.Chrome(executable_path="E:\软件安装\Google Chrome\chromedriver_win32_87\chromedriver.exe")#用chrome浏览器打开
    driver.get(url)       
    time.sleep(2)                            #让操作稍微停一下
    cookie = driver.get_cookies()
    time.sleep(2)
    
    # 滚动鼠标
    def execute_times(times):
        for i in range(times):
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            time.sleep(2)
    execute_times(scroll_times)
    
    info_list = []
    
    html = driver.page_source
    soup1 = BeautifulSoup(html,'lxml')    
    count = 0
    for info in soup1.find_all('tr'):
        try:
            #print(len(info.find_all('td')))
            # 基金链接及名称
            td = info.find_all('td')[3]
            link = td.find('a').get('href')
            name = td.find('a').contents[0]

            # 基金代码
            td = info.find_all('td')[0]
            code = td.find('input').get("value")
            # 日期 净值 近一周 近一月 近三月 近半年 近一年 今年以来
            td = info.find_all('td')[4]
            date = td.contents[0]
            jingzhi = info.find_all('td')[5].contents[0]
            week_change = info.find_all('td')[6].find('span').contents[0]
            month_change = info.find_all('td')[7].find('span').contents[0]
            threemonth_change = info.find_all('td')[8].find('span').contents[0]
            halfyear_change = info.find_all('td')[9].find('span').contents[0]
            year_change = info.find_all('td')[10].find('span').contents[0]
            thisyear_change = info.find_all('td')[11].find('span').contents[0]
            # 基金的信息
            assert str(code) in str(link)
            header = "基金代码,基金名称,日期,净值,近一周,近一月,近三月,近半年,近一年,今年以来,基金的详细链接"
            result = "{},{},{},{},{},{},{},{},{},{},{}".format(code, name, date, jingzhi, week_change, month_change, threemonth_change, 
            halfyear_change, year_change, thisyear_change, link)
            #print(result)
            info_list.append(result)
            count += 1
        except Exception as e:
            print("error", e)
    print("基金数量", count)
    #driver.close()
    return info_list, header
    
def main():
    # 股票型,...
    url_list = ["https://www.howbuy.com/fund/fundranking/gupiao.htm", "https://www.howbuy.com/fund/fundranking/zhaiquan.htm",
    "https://www.howbuy.com/fund/fundranking/hunhe.htm", "https://www.howbuy.com/fund/fundranking/licai.htm", 
    "https://www.howbuy.com/fund/fundranking/huobi.htm", "https://www.howbuy.com/fund/fundranking/zhishu.htm",
    "https://www.howbuy.com/fund/fundranking/jiegou.htm", "https://www.howbuy.com/fund/fundranking/qdii.htm",
    "https://www.howbuy.com/fund/fundranking/duichong.htm"]
    if not os.path.exists('./result/'):
        os.makedirs('./result/')
            
    for url in url_list:
        type = url.split("/")[-1].replace(".htm", "")
        print("type", type)
        info_list,header = extract_url_info(url=url, scroll_times=13)
        with open("./result/haomai_{}.csv".format(type), "w", encoding="utf-8") as f:
            f.write(header)
            f.write('\n')
            for info in info_list:
                f.write(str(info))
                f.write('\n')
    exit(0)


if __name__ == "__main__":
    main()

​ 爬取网站:好买基金 https://www.howbuy.com/fund/fundranking

​ 获取数据有,股票型,债券型,混合型,理财型,货币性,指数型,结构型,对冲型,QDII型基金,数据格式CSV文件。

爬取的信息:

基金代码,基金名称,日期,净值,近一周,近一月,近三月,近半年,近一年,今年以来,基金的详细链接
515700,平安中证新能源汽车ETF,01-26,2.3064,8.42%,13.16%,54.46%,81.46%,111.46%,10.77%,https://www.howbuy.com/fund/515700
501057,汇添富中证新能源汽车A,01-26,2.3612,8.09%,12.42%,50.39%,76.87%,109.81%,10.49%,https://www.howbuy.com/fund/501057
501058,汇添富中证新能源汽车C,01-26,2.3428,8.09%,12.39%,50.29%,76.64%,109.29%,10.47%,https://www.howbuy.com/fund/501058
159806,国泰中证新能源汽车ETF,01-26,2.2750,6.30%,9.58%,48.43%,76.53%,--,7.60%,https://www.howbuy.com/fund/159806
515030,华夏中证新能源汽车ETF,01-26,1.7843,6.23%,9.27%,47.87%,72.93%,--,7.34%,https://www.howbuy.com/fund/515030
165520,中信保诚中证800有色指数(LOF),01-26,1.4670,7.79%,12.67%,44.45%,44.87%,63.02%,13.11%,https://www.howbuy.com/fund/165520
161028,富国中证新能源汽车指数,01-26,1.1330,5.89%,8.32%,44.19%,67.27%,87.72%,6.79%,https://www.howbuy.com/fund/161028
009067,国泰中证新能源汽车ETF联接A,01-26,2.1231,6.06%,8.54%,43.48%,68.50%,--,7.32%,https://www.howbuy.com/fund/009067
009068,国泰中证新能源汽车ETF联接C,01-26,2.1182,6.05%,8.50%,43.37%,68.24%,--,7.29%,https://www.howbuy.com/fund/009068

最新估算净值爬取

#!/usr/env/bin python
#coding:utf-8

import requests
from bs4 import BeautifulSoup
import re

# 抓取网页
def get_url(url, params=None):
    """
    爬取天天基金中基金的最新估值。
    e.g. http://fundgz.1234567.com.cn/js/161725.js?rt=1589463125600
    return: html
    """
    rsp = requests.get(url, params=params)
    rsp.raise_for_status()
    return rsp.text

def get_fund_data(jjcode, base_url="http://fundgz.1234567.com.cn/js/"):
    """
    请求数据,处理html数据
    """
    url = base_url + str(jjcode) + ".js"
    html = get_url(url)
    print(html, type(html))
    soup = BeautifulSoup(html, 'html.parser')
    
    print("soup", soup, type(soup))
    
def main():
    # 请求数据
    get_fund_data(jjcode=161725)

if __name__ == "__main__":
    main()

3. 筛选优质基金(待完善)

3.1 金融知识

4433法则:一只好的基金要同时满足以下条件:

​ 1.近1年业绩排名在同类基金中位列前1/4;

​ 2.近2/3/5年业绩排名在同类基金中位列前1/4;

​ 3.近3个月业绩排名在同类基金中位列前1/3;

​ 4.近6个月业绩排名在同类基金中位列前1/3。

3.2 机器学习模型

​ 根据每周或者每月或者每半年等时间维度基金涨幅为预测目标,构建回归模型,预测各个基金的涨幅。

项目GitHub地址, 欢迎star。

已标记关键词 清除标记
相关推荐
©️2020 CSDN 皮肤主题: 酷酷鲨 设计师:CSDN官方博客 返回首页